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CHANGES IN ATTITUDES DURING COLLEGE 


HAROLD WEBSTER 
Vassar College 


For several years, the Mary Conover 
Mellon Foundation has supported a re- 
search program, the main purpose of 
which is to increase our understanding of 
the processes of learning and personality 
development in undergraduates who at- 
tend a women’s liberal arts college.’ The 
research has made use of a variety of in- 
formation concerning both students and 
alumnae (15, 16, 18). Data have been 
collected by means of interviews, testing, 
and general observation. 

The present paper reports some results 
of the personality testing program. Al- 
though at the research level there is al- 
ways some confusion in distinguishing be- 
tween attitudes, interests, values, and 
the like, it seems reasonable to regard 
most personality inventory responses as 
expressions of attitudes, including atti- 
tudes about the self. The test items to be 
discussed concern human feelings and ex- 
periences which are quite general; for 
some purposes they are probably supe- 
rior to the more specific kinds of items 
found in interest inventories (3, 13). 

Students undergo changes in attitudes, 
in widely varying degrees, while attending 
college (1, 5, 7, 8, 13, 14, 15). It 
is difficult, however, to relate such changes 
directly to college education; during late 


* During the present study, research staff 
members included John Bushnell, Mervin 
Freedman, Richard Jung, Nevitt Sanford, 
coordinator, and the writer. Donald Brown, 
Department of Psychology, Bryn Mawr Col- 
lege, has devoted five summers to work on 
the project. 


adolescence some variations in attitudes 
undoubtedly have little to do with formal 
educational experience. Intellectual matu- 
ration continues well into the college years 
(4, 11), but its effect on attitudes at 
this age level is largely unknown. 

Methods for measuring longitudinal 
change involve some serious problems. 
Corey (2) pointed out the error in 
equating observed differences between 
classes with changes which might have 
been found by later retesting the same 
students. True change may also be ob- 
scured, however, in retesting the same per- 
sons, either because it is confounded with 
measurement error (10), or because its 
sources have not been identified experi- 
mentally. If the internal consistency of the 
measures is low, longitudinal comparisons 
for individuals may not be meaningful, 
for true changes occurring simultaneously 
in more than one kind of attitude may 
then go undetected. This is the case even 
if “comparable forms” are employed for 
the testing. In the present paper “reli- 
ability” will always refer to internal con- 
sistency (test homogeneity). 


CONSTRUCTION OF A DEVELOPMENTAL 
ScaLE 

The initial selection of items and scales 
for an experimental battery was neces- 
sarily diverse because the aims of the re- 
search were quite general. The first bat- 
tery, administered in 1952 by Nevitt 
Sanford, contained many verbal and non- 
verbal personality items (15), which, to- 
gether with certain college entrance data, 


109 





110 


made possible comprehensive descriptions 
of students. Subsequently, a revision, 
which contained 677 verbal items, was 
administered to freshman and senior 
classes for four consecutive years. The 
battery was eventually revised a second 
time so that several new scales, including 
the one to be described, could be scored 
from it (18). 

From studies of both interview and test 
data, it was evident early in the research 
that more could be learned by studying 
changes in students, rather than by focus- 
ing on relatively permanent aspects of 
personality. This approach was also be- 
lieved more likely to lead directly to facts 
of importance for educators who are pri- 
marily concerned with inducing special 
kinds of change. It appeared that much 
personality development and reorganiza- 
tion was taking place in freshman subjects. 
As a result we began to study test mate- 
rial to which freshman responded differ- 
ently from older students. 

The Developmental Scale therefore con- 
tains personality inventory items found 
to discriminate graduating seniors from 
entering freshmen. As previously reported 
(15, 18) the scale was made up of items 
selected by comparing concurrent classes 
of seniors and freshmen. The present paper 
presents items which functioned best in 
single tests, and identifies those remaining 
after further cross-validation using a 
sample of 274 students twice, first as 
freshmen and later as seniors. Items which 
have survived both kinds of validation are 
likely to be of general interest. The pro- 
cedures used will be described only briefly. 

For the first sample (1953), 220 of the 
677 items discriminated the classes (441 
freshmen and 237 seniors) at the .05 level 
of significance, the majority of them also 
reaching the .001 level. Of these, 197 
items with means in the range .09 to 
91, inclusive, were retained for the test. 
In a second sample (1954, 225 freshmen 
and 192 seniors, selected randomly) 123 of 
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the 197 items were still functioning, ac- 
cording to the same criteria. 

At this point in the research two kinds 
of evidence suggested that items validated 
by using scores for the same persons 
would not differ markedly from items 
already selected using data from con- 
current classes: Mervin Freedman in an 
unpublished study reported no significant 
differences in performance on the items 
between those who remained and those 
who withdrew from college; and the vari- 
ances for senior total scale scores (based 
on the 123 items) were significantly larger 
than the variances for freshmen, an effect 
opposite to that which would be observed 
if there were increased homogeneity due 
to withdrawals from college. 

Subsequently most of the 1953 fresh- 
men became seniors and were retested 
with the same battery shortly before 
graduation in 1957. The test-retest data 
(N = 274) show that most of the items 
also function satisfactorily for the same 
persons. 

An approximate statistical test for cor- 
related frequencies based on an exact test 
by MeNemar (12, p. 56), was used for 
identifying items, responses to which dif- 
fered significantly in the same sample for 
the two occasions. 


Tue Scate Items 


Of the 123 items, 79 are listed in Table 
1. Some data on misclassification propor- 
tions and homogeneity appear elsewhere 
(15). The Kuder-Richardson formula 21 
reliability (KR 21) in an independent 
sample of 130 Vassar freshmen and 81 
seniors is .84 for the first 31 items and 
88 for the first 72. The reliability for all 
123 items was only .84 for this sample, and 
it was also substantially below that of the 
72-item scale in some other samples. Items 
in Table 1 which did not discriminate 
the same persons as freshmen and seniors 
at the 01 level of significance are pre- 
ceded by an asterisk; the final seven 
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items could be substituted for these 
starred items, probably with little effect 
on total seale reliability. 


REsvULTs AND Discussion 


The staff studied the content of the 
items, classifying them under such rubrics 
as freedom from compulsiveness, flexibil- 
ity and tolerance for ambiguity, impuni- 
tive attitudes, critical attitudes toward 
authority (including parents or family, 
the state, religion, rules, etc.), intracep- 
tion, mature interests, unconventionality 
or nonconformity, rejection of traditional 
feminine roles, freedom from cynicism 
about others, realism, and so on; such 
clusters have not been studied statistically. 
A factor method (19) was applied which 
produced the scales consisting of the first 
31 and the first 72 items in Table 1. The 
general factor of either scale was called 
“Rebellious Independence,” a name which 
seems adequate only if “independence” is 
interpreted broadly enough; for example, 
it should include an attitude of tolerance 
toward human weaknesses. 

Some statistics for the 72-item scale ap- 
pear in Table 2. The cooperation of Dr. 
Pergrouhi Najarian, of the Beirut College 
for Women, in obtaining the Arab data, 
and of Dr. Margaret Luszki, of the Na- 
tional Institute of Mental Health, in pro- 
viding the Paine College data for Southern 
Negroes, is greatly appreciated. For use 
with diverse groups KR 21 is known to be 
the most logical reliability measure (9), 
even though it is slightly smaller on the 
average than any kind of split-half co- 
efficient. The values for KR 21 in Table 2 
are cause for some skepticism regarding 
the meaning for other cultural groups of 
the 72-item scale. Beirut and Paine stu- 
dents had more than average difficulty 
in understanding some items, which prob- 
ably decreased reliability. It is likely, 
however, that “Rebellious Independence” 
would have been expressed somewhat dif- 
ferently if scales had been constructed 
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primarily for use with either Arab girls 
or Southern Negroes. The same argument 
applies, with less force, for Vassar alum- 


nae; as expected, however, there is less 
decrease in reliability for alumnae sam- 
ples. Also, low reliability often accom- 
panies reductions in variance—and smaller 
variances in Table 2 for alumnae, in com- 
parison with Vassar seniors, would be 
expected for reasons discussed below. If 
the scale were to be used for comparing 
individuals, reliability could be improved 
by presenting the items with multiple 
response alternatives. 

The difference between standard devia- 
tions of the scores for the concurrent 
classes of Vassar seniors and freshmen in 
Table 2 is significant, as it also is for the 
test-retest group of Table 5. Variability 
on a majority of the other personality 
scales has also been observed to be larger 
for Vassar seniors than for freshmen, either 
for the same or different students. What- 
ever the cause of this effect, it is strong 
enough to more than compensate for 
increased homogeneity due to withdraw- 
als from college. For the four classes of 
Paine women students of Table 2, how- 
ever, the differences among standard de- 
viations are not large and would be ex- 
pected by chance alone as often as 16% of 
the time. 

It may be that with more reliable and 
general measures, the effect of increasing 
variances with increasing age will hold 
more generally, at least up to a certain 
age, after which a decrease would be 
expected. Matteson (13) found support 
for his hypothesis that interests of college 
students increased with actual experience, 
and Strong (17) has noted an increased 
diversification of interests for males be- 
tween ages 15 and 25. To the extent that 
this process of diversification was uneven, 
or entailed earlier changes in some subjects 
than in others, variances would at first in- 
crease with age. A number of studies of 
personality development, for example, 
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TABLE 1 
Tue DEVELOPMENTAL ScALE* 








I would rather be a steady and dependable 
worker than a brilliant but unstable one. 
(F, 31-55) 

In school I always looked far ahead in 
planning what courses to take. (F, 44-62) 

Straightforward reasoning appeals to me 
more than metaphors and the search for 
analogies. (F, 30-49) 

I have never done anything dangerous for 
the thrill of it. (F, 55-70) 

I would disapprove of anyone’s drinking to 
the point of intoxication at a party. 
(F, 30-65) 

I set a high standard for myself and I feel 
others should do the same. (F, 31-44) 

No man of character would ask his fiancée 
to have sexual intercourse with him before 
marriage. (F, 35-64) 

I would be ashamed not to use my privilege 
of voting. (F, 19-29) 

Lawbreakers are almost always caught and 
punished. (F, 47-66) 

Every family owes it to the city to keep 
their sidewalks cleared in the winter and 
their lawn mowed in the summer. (F, 
30-44) 

I have very few quarrels with members of 
my family. (F, 32-42) 

It is a pretty callous person who does not 
feel love and gratitude towards his par- 
ents. (F, 32-56) 

One of my aims in life is to accomplish 
something that would make my mother 
proud of me. (F, 30-48) 

Sometimes I used to feel that I would like 
to leave home. (T, 37-50) 

My home life was always happy. (F, 28-45) 

I have often gone against my parents’ 
wishes. (T, 16-26) 

In the final analysis, parents generally turn 
out to be right about things. (F, 15-44) 

I have often either broken rules (school, 
club, etc.) or inwardly rebelled against 
them. (T, 28-41) 

Only a fool would try to change our Ameri- 
can way of life. (F, 50-76) 

I go to church almost every week. (F, 50-72) 

I pray several times every week. (F, 36-57) 

I believe in a life hereafter. (F, 39-53) 

In religious matters, I believe I would have 
to be called an agnostic. (T, 16-35) 

I should like to belong to several clubs or 
lodges. (F, 38-68) 

I used to steal sometimes when I was a 
youngster. (T, 23-35) 


I like to talk about sex. (T, 38-65) 

I do not always tell the truth. (T, 49-62) 

People would be happier if sex experience 
before marriage were taken for granted 
in both men and women. (T, 18-32) 

I dislike women who disregard the usual 
social or moral conventions. (F, 39-69) 

I believe women ought to have as much 
sexual freedom as men. (T, 24-47) 

*The history of mankind is a record of 
continual progress, and there is no reason 
to believe that it will not continue. (F, 
25-27) 

I have had periods of days, weeks, or 
months when I couldn’t take care of 
things because I couldn’t “get going.”’ 
(T, 21-32) 

*I do not like to see people carelessly 
dressed. (F, 32-41) 

It is annoying to listen to a lecturer who 
cannot seem to make up his mind what he 
really believes. (F, 16-37) 

A strong person will be able to make up his 
mind even on the most difficult questions. 
(F, 52-63) 

I don’t like to work on a problem unless 
there is the possibility of coming out with 
a clear-cut and unambiguous answer. 
(F, 60-78) 

For most questions there is just one right 
answer, once a person is able to get all the 
facts. (F, 65-93) 

I think I am stricter about right and wrong 
than most people. (F, 58-70) 

I always tried to make the best school 
grades that I could. (F, 29-51) 

A large number of people are guilty of bad 
sexual conduct. (F, 49-69) 

The trouble with many people is that they 
don’t take things seriously enough. (F, 
50-71) 

At times I have been so entertained by the 
cleverness of a crook that I have hoped 
he would get by with it. (T, 34-56) 

A person who doesn’t vote is not a good 
citizen. (F, 27-43) 

Some of my family have habits that bother 
and annoy me very much. (T, 46-61) 

*My parents have often disapproved of my 
friends. (T, 12-15) 

*Army life is a good influence on most young 
men. (F, 35-41) 

Disobedience to the government is never 
justified. (F, 65-83) 

If I were confronted with the necessity of 
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TABLE 1—Continued 








betraying either my country or my best 
friend, I would prefer to betray my 
country. (T, 11-24) 

Communism is the most hateful thing in the 
world today. (F, 67-87) 

I believe in the second coming of Christ. 
(F, 75-84) 

Everything is turning out just like the 
prophets of the Bible said it would. (F, 
78-91) 

I believe there is a God. (F, 10-25) 

Human passions cause most of the evil in 
the world. (F, 30-50) 

The best theory is the one that has the best 
applications. (F, 31-55) 

In illegitimate pregnancies, abortion is in 
many cases the most reasonable alterna- 
tive. (T, 22-37) 

I have used alcohol excessively. (T, 05-17) 

I have never done any heavy drinking. (F, 
13-43) 

If I could get into a movie without paying 
and be sure I was not seen I would prob- 
ably do it. (T, 22-46) 

I have never indulged in any unusual sex 
practices. (F, 06-17) 

My sex life is satisfactory. (F, 16-26) 

*I often do whatever makes me feel cheerful 
here and now, even at the cost of some 
distant goal. (T, 39-41) 

I would be uncomfortable if I accidentally 
went to a formal party in street clothes. 
(F, 24-41) 

*Some of my friends think that my ideas 
are impractical, if not a bit wild. (T, 
23-26) 

Kindness and generosity are the most im- 


portant qualities for a wife to have. (F, 
47-60) 

I believe we are made better by the trials 
and hardships of life. (F, 12-38) 

Any man who is able and willing to work 
hard has a good chance of succeeding. 
(F, 07-24) 

I am an important person. (T, 16-31) 

I hardly ever tell people what I think of 
them when they do something I dislike. 
(F, 35-59) 

I have never felt better in my life than I do 
now. (F, 37-53) 

At periods my mind seems to work more 
slowly than usual. (T, 53-66) 

My daily life is full of things that keep me 
interested. (F, 04-15) 

* Sometimes I feel that I am about to go to 
pieces. (T, 26-28) 

I don’t like modern art. (F, 62-81) 

Our thinking would be a lot better off if we 
would just forget about words like ‘‘prob- 
ably,” “approximately,’’ and “‘perhaps.”’ 
(F, 81-94) 

It is a good rule to accept nothing as certain 
or proved. (T, 23-44) 

Every citizen should take the time to find 
out about national affairs, even if it means 
giving up some personal pleasure. (F, 
10-20) 

People have a real duty to take care of their 
aged parents, even if it means making 
some pretty big sacrifices. (F, 15-28) 

I would like to hear a great singer in an 
opera. (T, 76-92) 

It is very important for my feeling of se- 
curity that people about me like me per- 
sonally. (F, 13-25) 





* Responses, T for true, F for false, are those used more often by high-scorers, that is, by seniors. Figures 
are percentages of the same group of 274 individuals as freshmen and as seniors, respectively, who responded in 


the direction indicated. 


those of White (20), have emphasized 
the uneveness of the processes involved in 
attaining greater maturity. Some data of 
Corey’s (2) on attitude items show larger 
variances for student groups at later ages. 
In agreement with this theory, standard 
deviations of Vassar age groups in Table 
2 vary significantly, rising and falling for 
freshmen, seniors, middle-aged alumnae, 
and older alumnae. 

If Vassar seniors are more diverse than 


their elders on the trait, Rebellious In- 
dependence, at the same time possessing 
on the average more of it (see Table 2), 
then they may also have become more 
aware of, or more sensitized to, the im- 
mediate problems of social conformity 
(6). Interview data indicate that this is 
undoubtedly the case for the majority 
of these young women at this time in their 
lives. There is little evidence, however, 
that this increased awareness of, or con- 
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TABLE 2 
CoMPARISON OF VaRIOUS GROUPS ON THE SHort Form (n = 72) DEVELOPMENTAL SCALE 


College 





Vassar 
Beirut 
Paine 
Paine 
Paine 
Paine 
Paine 
Paine 
Paine 
Paine 
Vassar 
Beirut 
Vassar 
Vassar 
Paine 
Paine 


Freshmen 
Freshmen 
Freshmen 
Freshmen 
Sophomores 
Sophomores 
Juniors 
Juniors 
Seniors 
Seniors 
Seniors 
Seniors 
1930-35 

1904 

Total 

Total 


Zs 


Ps Pay Py Py Pt to wy 


2 
= 





‘rr 

23.567 9 . 22¢ 825 
24.929 .518 
16.603 .656 
19.867 4 717 
S41 .581 
889 .726 
.571 .629 
357 .000 
19.000 .704 
21 23.286 .578 


197 34.700 S64 
33 25.667 .5A7 
50 31.060 77 
82 17.780 
156 18.244 
83 21.277 


.759 
.670 


® Reliabilities estimated by Kuder-Richardson Formula 21 


cern with, conformity is actually accom- 
panied by increasingly homogeneous atti- 
tudes, or, for that matter, by increasingly 
conforming behavior within college classes. 
On the contrary, seniors appear generally 
to be less homogeneous and less conform- 
ing than freshmen. The one exception, 
which agrees with a general observation by 
Jacob (6), is that seniors, paradoxically 
perhaps, express more uniformly than 
freshmen a greater degree of tolerance for 
nonconforming ideas and behavior. Among 
dozens of personality scales, those meas- 
uring authoritarianism, lack of tolerance, 
etc. are among the few for which obtained 
variances for seniors are usually slightly 
less than those for freshmen; of course the 
mean tolerance scores are much higher 
for seniors, in either concurrent or test- 
retest comparisons (15). 

The freshman-senior mean differences 
for institutions in Table 2 are in the 
expected direction, even though the one 
for the Beirut sample is not significant, 
and that for Paine females reaches only 
the 12% level. The test ratio for Paine 
women is significant if pairs of extreme 
classes are grouped (t = 3.06, homogeneous 

a 8 


variances), and the four means vary sig- 
nificantly (F = 4.28). There are also large 
differences among Vassar age-group means ; 
the low mean for the oldest alumnae group 
may reflect decline in vigor rather than lack 
of desire for independence. The higher 
Beirut freshman mean may be due to selee- 
tivity from a culture in which only very ex- 
ceptional women attend college. 

The large mean differences between Vas- 
sar and Paine women students are difficult 
to interpret without more information 
sbout the latter. Social situation, including 
status within the general age group, may 
be involved. A trend downward for Paine 
senior women might be related to antici- 
pated re-entry into a larger culture which 
will demand conservative or submissive 
behavior from them. The Paine college 
sex difference is significant (t = 3.45), and 
although there is little known concerning 
the meaning of the scale for men, this 
agrees with a preliminary finding reported 
by Lisa Alfert on differences between Ger- 
man university men and women. The scale 
undoubtedly reflects some attitudes which 
are culturally more acceptable when ex- 
pressed by men rather than by women. 
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TABLE 3 
CORRELATIONS OF DEVELOPMENTAL SCALE 
Scores with Mean Trait Ratinas 
or 50 ALUMNAE BY 5 ASSESSMENT 
Starr RatTers* 


Correlation 

Authoritarianism 31 
Esthetic appreciation 31 
Capacity for further growth .39 
Intraception .30 
Complexity 48 
Independence of judgment 36 
General ability 50 
Appreciation of intellectual ac 

tivities 
Originality 
Emotional interference 
Self-insight 
Sensuality 
Likeableness 
Anxiety about fulfilling own as- 

pirations 
Anxiety about behaving in ac- 

cordance with own standards 
Breadth of psychological 

awareness 
® Trait ratings for which the correlation was less ex- 
treme than +.30, of which there were 16, have been 
omitted. 


Correlations of the scale with a reliable 
suppression measure (18) were uniformly 
low for the samples of Table 2, except for 
Paine freshmen women for whom it was 
— 5A. 

Some validity studies, using college ma- 
jor groups and interview data, are dis- 
cussed elsewhere (15). Tables 3 and 4 
summarize additional validation material 
for the assessment sample’? of 50 Vassar 
alumnae, which also appears in Table 2. 
The correlation with intelligence, as meas- 
ured by the Terman Concept Mastery 
test, is negligible. Otherwise the correla- 


?In addition to staff members listed in 
the first footnote, the following also took 
part in the assessment project: Frank 
Barron, Jack Block, and Richard Crutch- 
field of the University of California; Dwight 
Chapman and Robert Nixon of Vassar 
College; and Eugenia Hanfman of Brandeis 
University. 


TABLE 4 


CORRELATIONS OF DEVELOPMENTAL SCALE 
Scores With OTHER PERSONALITY 
ScaLe Scores ror 50 ALUMNAE 


Scale Correlation 


Authoritarianism, F Scale — .36 
Ethnocentrism, E Scale — .44 
Authoritarianism, derived scale — .57 
Developmental Scale, long 
form: 123 items. 4 
Impulse Expression (16) .70 
Masculinity-Femininity sub- 
scales (15) 
MF I, Conventionality 
MF II, Passivity 
MF III, Intraception 
MMPI Scales 
L, Validity 
F, Validity 
D, Depression 
Pd, Psychopathic Deviate 
Pt, Psychasthenia 
Sc, Schizophrenia 
Terman Concept Mastery— 
Form B 


tions in Tables 3 and 4 differ significantly 
from zero in the expected directions, de- 
spite attenuation by unreliability. Correla- 
tions in Table 3 are in agreement with 
some adjectives checked significantly of- 
ten by assessment staff members to de- 
scribe the 50 alumnae. The adjectives de- 
scribing high-seorers included complex, 
individualistic, interesting, sophisticated, 
frank, rebellious, and the like, while low- 
scorers were described as conforming, con- 
ventional, conservative, dutiful, mild, ete. 

The correlations in Table 4 with Impulse 
Expression and measures of authoritarian- 
ism are of the magnitudes found for con- 
temporary students; it was previously re- 
ported that seniors, when compared with 
freshmen, were more aware of impulses 
and emotional needs, but at the same time 
were less authoritarian (15, 16). This is 
probably also the case for high-scoring 
alumnae, in comparison with low scorers. 
The other correlations in Table 4 are also 
similar in magnitude to those obtained for 
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TABLE 5 
DEVELOPMENTAL ScALE FRESHMAN-SENIOR 
Test-Retest Data FoR 274 
Vassark STUDENTS 








Freshman} Senior 





22.737 
81.749 


35.628 
120.993 


Mean 

Variance 

Standard Devi- 
ation 

Reliability 


11.000 
- 863 


9.041 
-821 














* Both values of ¢, calculated with allowance for the 
correlation (12), are significant at the .00! level. 


students. The fact that high scorers tend 
to be low on authoritarianism and ethno- 
centrism is consistent with the idea that 
the scale measures an aspect of social ma- 
turity. Increases in maturity are accom- 
panied by more independence and hence 
by more freedom to criticize, more resent- 
ment of formalized authority, and better 
understanding of the kinds of adaptation 
which are necessary in complex situations; 
but the energy and aggression required for 
such independence is not directed either 
toward the self or toward others who are 
misperceived as entirely different from the 
self. 

The test-retest correlation of .671 was 
used in computing the test ratios for both 
differences in Table 5. Sampling bias could 
not have affected the results in Table 5 
very much: of 288 freshmen, only one did 
not take the test; of the 287 who took the 
test as freshmen, 13 either failed to ap- 
pear, or else did not complete the test, at 
the end of the senior year. 


SuMMARY 


Problems arising in evaluating test-re- 
test data for college students, especially 
in the area of attitude measurement, are 
discussed. A scale is presented which was 
derived by selecting attitude items which 
discriminated seniors from freshmen, both 


HAROLD WEBSTER 


for concurrent classes and for 274 fresh- 
men retested near the end of their senior 
year. New reliability and validity data 
for the scale are summarized. It would be 
advantageous to have a more reliable in- 
strument for purposes of cross-cultural 
research. Also scores of the present scale 
may not be very valid indicators of the 
maturity of attitudes for subjects from 
other populations, for example, elderly 
subjects or persons from radically differ- 
ent cultures. 

Data support some previous findings 
that there are substantial changes in atti- 
tudes during college, and that the atti- 
tudes expressed will vary with age, sex 
and culture. The variations occur not only 
in means, but also in variances, a fact at- 
tributable to differential maturation rates: 
rates of change toward greater amounts 
of Rebellious Independence, the first fac- 
tor in the scale, differ enough among Vas- 
sar college students to more than com- 
pensate for increasing homogeneity due 
to withdrawals from college. 

Results are interpreted as supporting 
those personality theories which empha- 
size increasing complexity, differentiation, 
ability and independence during late ado- 
lescence; they do not support the view 
that college students become more alike 
in their general attitudes while attending 
college. 
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THE EFFECT OF GROUP STUDY ON GRADE ACHIEVEMENT 


JOHN T. BLUE, JR. 
North Carolina College 


There is a very great need to examine 
critically the student and the whole con- 
tex in which his learning takes place. The 
quality of the staff, facilities, and in- 
structional techniques means little if the 
motivation of the students is not consonant 
with the institutional objectives. 

The question of whether it is more bene- 
ficial to study alone or in collaboration 
with other students still warrants investi- 
gation. It is an American collegiate culture 
pattern (See 5) for small groups of under- 
graduate students to get together to “bone 
up” for examinations, or to complete 
complex assignments. At the graduate 
levels, study groups are formed in which 
the questions for previous “prelims” and 
the literature in the fields is reviewed. The 
value of “skull sessions” is taken for 
granted by most of the students who par- 


ticipate. We have no empirical evalua- 
tion of the relationship between par- 
ticipation in study group activity and 
student achievement in regular academic 
work. In this paper some data will be 
evaluated which will throw some light on 
the question. 


Tue NaturaL EXPERIMENT 


As a part of the requirements for a 
course in social psychology, each student 
was required to complete an observational 
study. In consultation with the instructor, 
the students developed a problem, gath- 
ered data, and evaluated them. A further 
requirement was that the observed events 
on which the data were based be some 
real or contrived situation on campus. One 
of the students’ observed that in one of 


? The writer wishes to thank Hazel Moore 
for cooperating in securing the data and for 
a skillful job as participant observer. 


¢ 

her classes in library science some of the 
students had spontaneously organized a 
study group which met regularly while 
others worked alone. The observational 
problem agreed upon to satisfy the as- 
signment was to analyze the sociometric 
structure of the class and to describe the 
pattern of relationships characterizing the 
participants and nonparticipants in group 
study. The honor point averages which are 
a handy indicator of student achievement 
and scholastic skill were secured from the 
registrar. A check of the honor point 
averages revealed that the spontaneously 
organized group, hereafter called Group 
A, which was composed of eight persons 
had a mean honor point average of 1.81 + 
AS, while the mean grade point average 
of the twelve persons who studied alone, 
Group B, was 1.57 + .33. The deviations 
from the means and the honor point scores 
were examined to see whether the groups 
could be made statistically more alike 
by shifting two persons. After a number 
of calculations proved it possible, two 
members of the class who studied alone 
were urged by the participant observer to 
join the study group. This shift equalized 
the number of persons in each group 
as well as brought the means and vari- 
ances of the grade point averages closer 
together. The mean grade averages of 
Group A and B then became 1.73 + 54 
and 1.67 + 51, respectively. The t value 
of these two distributions of grade points 
with 18 degrees of freedom was .909. 
Hence, it was concluded that the two con- 
trived groups were not significantly dif- 
ferent. The test of runs supported this 
conclusion. 


*Since the number of subjects was small 
and there was no ground to support the 
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GROUP STUDY AND ACHIEVEMENT 


During the second week of the students’ 
observational study, these contrasting 
situations were realized by the writer to 
be an excellent example of the natural 
experiment (See 1; 3, pp. 9-12). The 
hypothesis of the natural experiment 
would be that group-study contributes 
to higher grade achievement. Although 
groups were already rather well equated, 
analysis of covariance (4) would thor- 
oughly account for initial differences in 
grade point averages in evaluating the 
differences in grade achievement. Further, 
these natural events, in contrast with the 
contrived laboratory situations, were a 
case of real life in which the subjects were 
experimentally naive. With a student act- 
ing as a participant observer, the sub- 
jects were not affected by interaction with 
an instructor. 

However, if these situations were a 
natural experiment, then it was possible 
to design a projected experiment (1, 3) 
which is invariably more efficient than a 
natural experiment because more control 
is imposed. In order to create a projected 
experiment, it would be necessary to have 
a design specifying the groups and their 
sequences of treatment. The design must 
also specify: the valid and reliable indices 
of the cause and effect factors, the rele- 
vant correlative variables to be controlled, 
and the standards of comparison (control 
group) which are essential in statistically 
testing the hypotheses. An additional re- 
quirement would be that the experimenter 
contrive and impose on the subjects the 
causal condition and obtain measurements 
of what is designated the effect. 





assumption of normality, the nonparametric 
test of runs and sign tests were applied to 
check all ¢’s to see if the assumption of 
normality was being violated. In no case 
did the results of the nonparametric tests 
contradict the ¢ test, hence only the ¢’s are 
reported. For a full description of the sign 
test and the test of runs, see (2, pp. 274- 
261). 
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IMPOSING AN EXPERIMENTAL DESIGN 


The situations as first observed 
represented in Table 1 by the block called 
First Phase in which Group A worked as 
a spontaneously organized group, while 
the students in Group B worked alone. 
By reversing the study conditions, as 
shown in Phase II where Group A students 
studied alone and Group B students 
worked in an organized group, conditions 
for an experiment were met. 

At Phase III, both groups of students 
were to study as organized groups. This 
phase was added as a check on the 
data obtained in the first two phases. 
The groups were regarded as reasonably 
equated since the means of the grade 
point averages had been made nearly 
equal and the ¢ test indicated that the dif- 
ference between the means for grades made 
previously to the projected experiment was 
not statistically significant. The design is 
such that, by comparing each group with 
itself from phase to phase, each group was 
its own controlled standard of comparison. 
The changes in grades were acceptable as 
concrete, valid, and reliable indices of grade 
achievement. In the light of the hypothesis, 
an acceptable experimental design had 
been created. 


are 


EXPERIMENTAL EVENTS 


Table 2, showing the results and pre- 
senting an evaluation of the results, suc- 
cinctly describes the experimental events. 


TABLE 1 
PRoJECTED EXPERIMENTAL Design 


Phase 


Group A* Group B* 





Phase I 
Phase II 
Phase III 
Note.—The exceptional student in Group A did not 
participate in group study at Phase II or III. 
*+X = Studied in a group; 
—X = Studied alone; 
+X’ =Group study in 3rd phase. 


—X 
+X 
+X’ 
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TABLE 2 
EVALUATION OF GRADE AVERAGE Fi FOR Groups A AND B ComPaRrep | BY PHASES 








Group A 


Com arison of Mean 


Group B to Mean B 





Treat- 


ment Grade average 


Treat- 


pl df 


ment 





+X 82. 
-X 74. 
+X 81. 





6. 
8. 
6 





-X 69. : -Ol | 17 
+X 77. . -10 | 17 
+X 79. ‘ : 10 | 17 








During Phase I which lasted three weeks, 
the sociometric structure as well as the 
academic achievement of the students was 
noted. The students received twelve grades 
on quizzes, test demonstrations, and as- 
signments during this period. The mean 
of these grades was computed for each 
student and these individual mean grades 
were averaged for each group. This pro- 
cedure was used to summarize classroom 
grade achievement in all three phases for 
both groups. The mean of the students’ 
averages for Group A* was 82.3 + 6.93, 
while that of Group B was 69.6 + 6.72. 
The t value of 3.929, with 17 degrees of 
freedom and the probability value being 
01, warranted the judgment that the class- 
room performance of the groups was sig- 
nificantly different. The test of runs indi- 
cated that the t value was not affected by 
the assumptions necessary for the use of 
the t test. 

In order to manipulate the groups to 
determine what the effect of reversing 
the study method would be, the partici- 
pant observer revealed her interest in what 
had been taking place. She candidly asked 
the students in Group A to disband the 
study group, presenting them with a biased 
argument to the effect that they might 
perhaps do better studying alone. Paren- 
thetically, the students with the highest 
averages were most willing to disband the 
group. The members of Group B were 
asked to cooperate in the project and to 
organize a study group. Two members of 


Group B who had done moderately well 
(85 and 81 averages) studying alone on 
the first set of tests were not inclined to 
cooperate; however, they agreed to go 
along. 

The new arrangement, Phase II, was 
observed for four weeks. Only the grades 
for the last three weeks of the second 
period were used in computing the stu- 
dents’ averages because a check of the 
first week’s grades showed that, in both 
groups, all but one person had done poorly. 
The mean achievement for Group A whose 
members were now studying alone was 
745 + 8.40, while the mean for Group B 
members who were participants in an 
organized group was 77.6 + 5.88. The 
test of differences between the two groups 
for this period was significant at the less 
than .10 level, t being equal to .795 with 
17 degrees of freedom. Hence, the two 
means may be regarded as not significantly 
different. In the third phase of the ex- 
periment, all the students except one 
studied in an organized group. On no test 
or assignment had the exception failed to 
at least tie for the highest grade or lead 
the class. The sociometric observations 
indicated that she was the person whose 
judgement was most highly regarded by 
the members of the class and on whom the 
other members of Group A had come to 
rely when that group had studied together. 
Moreover, this student was the only one 
whose average rose under the new con- 
dition imposed in Phase II. In order to 
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TABLE 3 
Tue ¢ Vatugs or Means or MEAN AVERAGE GRADE 








Treatment 








Study in a group = +X 
Group A, spontaneously organized 
Group B, studying as an organized 
group 
Group A, studying as an organized 
group 
Group B, studying as an organized 
group 
Studied alone = —X 
Group A, studied alone 
Group B, studied alone 














see what effect her absence would have 
on group study meth od, she was asked to 
work alone at Phase III. Because the 
method of using ¢ to evaluate changes re- 
quires paired scores, her grade averages 
were not used in computing the means 
for Group A. The mean achievement for 
Group A, studying together without the 
influent at Phase III, was 81.3 + 6.60, 
while the mean for Group B was 79.4 + 
5.04. The t value for the difference be- 
tween the two means was 0.680 and with 
17 degrees of freedom the value of p was 
less than .10. 

The t test for the comparison of achieve- 
ment under similar treatments is pre- 
sented in Table 3. These comparisons most 
fully describe the measured results of 
studying under similar conditions and 
enable us to discern whether the grade 
averages of the groups differ significantly 
when they studied under similar condi- 
tions. This pattern of differences between 
means enables us to distinguish whether 
some extraneous factors such as easier tests 
and assignments have been affecting aver- 
age grades during the various phases. The 
low t values indicate that the two groups 
were responding alike to the conditions 
(treatments) imposed on them during the 
experiment. The responses of both groups 


to a given treatment are not statistically 
significantly different. Thus, we may con- 
clude that the treatments effect similar 
results in both groups. 


EVALUATION OF HyporHEsis 


The really important comparisons are 
the changes in mean levels of achieve- 
ment in both groups which correlate with 
changes in the study pattern from phase to 
phase shown in Table 4. As was pointed 
out, the grades of the star student previ- 
ously mentioned were omitted from these 
calculations of differences between grade 
average of individuals during the phases 
through which each group passed so that 
the differences between paired scores could 
be used instead of differences between 
means. This method enables one to evalu- 
ate the mean changes rather than the dif- 
ferences between means, and the calcula- 
tions of the t value takes into account the 
correlation due to the scores’ being paired. 

When the persons in Group A switched 
from group study to individual study, the 
mean differences between students’ scores 
on Phase I (group study) and Phase II 
(study alone) was 8.88 + 5.76. The mean 
of the differences between the averages 
for Phase I (group study) and Phase II 
(study alone) for the students of Group 
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TABLE 4 
EVALUATION OF THE MEANS OF THE DIFFERENCES BETWEEN PAIRED 


Paired scores compared 


Scores BY PHases ror Groups A anp B 


Average gain or loss 








Degrees of 
Freedom 


t value p value 








Group A 
Phase I to Phase II 
Phase II to Phase III 
Phase I to Phase III 

Group B 
Phase I to Phase II 
Phase II to Phase III 
Phase I to Phase III 








® Statistically, not significantly different. 


B was 7.26 + 4.85. The t value shown 
in Table 4 indicates that both of these 
changes were significant above the .001 
level. When Group A was restored to the 
group study method in Phase III, the 
mean of the differences between the grades 
earned studying alone and the grades 
earned while studying as a group was 
+ 5.57 + 6.21 and the change was statis- 
tically significant (t = 5.206, df 17, p> 
001). These comparisons were consistent 
with the hypothesis that group study bene- 
ficially affects grade achievement. 

The use of the sign test® to evaluate 
the pattern of gains and losses supported 
the t’s and threw further light on the 
problem. As the r’s and their critical values 
were being computed, the magnitude of 
the gains and losses of the individual stu- 
dents was examined. The star student 
showed a consistent though small gain 
from phase to phase, although deprived of 
group study in Phases II and III. The 
averages of all other members of Group 
A dropped at Phase II when they studied 
alone. In Phase III, the students in Group 
A improved their performance. The aver- 
-ge of all the students in Group B except 
two increased at Phase II. Of these two, 
one student’s averages were tied and the 
other lost three points. The student who 


*The average for an exceptional student 
was omitted from the calculations for rea- 
sons stated later in this paper. 


lost ground was found to be carrying 
five hours over the standard load of fifteen 
hours. 

The tangental comparisons are sugges- 
tive and are worth mentioning. The mean 
of the changes, 2.91 + 5.56, which is not 
significant statistically from Phase II to 
Phase III in Group B indicated that there 
is a limit as to how much improvement 
group study can effect. The low p value 
leads to the suspicion that the differences 
may be due to sequent learning* or to the 
increased efficiency of the study group 
as it became more integrated. The Group 
A differences between Phase I (where the 
exceptional student participated in the 
study group) and Phase III (when de- 
prived of their leader) was 1.37 + 6.77 
with 9 degrees of freedom and p being 
greater than .100. The smallness of the 
difference suggests that the group mem- 
bers’ achievement is not wholly dependent 
upon the leader. The Phase I and III com- 
parisons for Group B with the mean of 
the difference being 9.49 + 4.53 being 
6.69 and p being greater than .001, may be 
regarded as the mean amount of improve- 
ment that occurred both due to sequent 
learning and group study.“ The gain is 
quite significant. 


*Sequent learning is the author’s termi- 
nology for the improvement on a successive 
test which is due to the student’s having 
learned how to take tests. 
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CONCLUSIONS 


The t values of the means of the dif- 
ferences between paired scores from phase 
to phase, as well as the critical values when 
the sign test was applied, warrant the 
conclusions that the group study method 
results in higher grade achievements for 
most students. The indication is warranted 
that all but exceptional students will 
probably improve their grades when 
studying in an organized group. 

Replication of the experiment with 
tighter controls and several and varied 
experimental groups seems desirable. The 
design of the replication experiment ought 
to consider the need for a larger number 
so that there will be more certainty about 
the reliability of the statistics. Moreover, 
there ought to be a greater assurance that 
the grading and quality of teaching from 
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phase to phase are equalized. The as- 
sumption that the tests and assignments 
were of equal difficulty from phase to phase 
ought to be investigated further. In spite 
of the limitations set forth, these data 
strongly indicate that the ordinary stu- 
dent will probably improve his grades by 
studying in an organized group. 
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The present study is concerned with 
some aspects of trait ratings of students 
in a public high school. 

Rating techniques, since their formal in- 
ception, have aroused considerable con- 
troversy, much of it centering in the prob- 
lems of estimating reliability and validity 
and, related to the latter, the plaguing 
“halo effect.” 

Reliability estimates have varied greatly 
depending on the method used. Rerating 
almost always produces higher estimates of 
reliability than agreement among raters. 
This is particularly so if the interval be- 
tween rating and rerating is short. 

The problem of the validity of ratings 
has been a disturbing one in that ratings 
generally fail to correlate well with ob- 
jective test scores. However, many investi- 
gators have been willing to settle for the 
position set forth by Garrett and Schneck 
(3) that agreement among judges is an 
indicator of validity as well as reliability. 

Reactions concerning the general value 
of rating techniques have varied widely. 
Typical of the optimistic view represented 
in the literature one reads: “In view of 
al] the evidence accumulated during the 
past few years, no one can any longer 
deny to ratings a place beside objective 
verbal tests as dependable measuring de- 
vices—uniquely valid for measuring cer- 
tain types of conduct in normal life situa- 
tions” (8, p. 232). On the more pessimistic 
side one finds: “The inconsistencies of the 
ratings of the teachers with themselves 
and with each other indicate dangers in re- 
lying on the ordinary rating scales for the 
measurement of traits” (7, p. 119). 

Most investigators, eschewing either ex- 


* This paper reports a portion of a study 
supported by the Silver Hill Foundation, 
New Canaan, Connecticut. 


treme position, occupied themselves with 
determining the conditions under which 
ratings can be best made. Much of the 
early literature concerned with the avoid- 
ance of “common pitfalls” was reviewed 
by Weiss (12). As early as 1927 Watson 
(11, p. 74) was able to list 25 “findings 
that have been established more or less 
firmly through the experimental work of 
the last 20 years.” Most of these findings 
subsequently became and still are contro- 
versial. 

Despite its mixed reception over the 
years, the rating scale has more than sur- 
vived; it has permeated nearly all activi- 
ties where assessment of individuals is 
called for. Particularly widespread is its 
use in education. It therefore seems useful 
to make a current investigation of trait 
rating of pupils by teachers. Certainly a 
process that involves many teacher-hours 
deserves periodic examination, if only to 
assess the returns from this large invest- 
ment in time and energy. It is particularly 
important to shed light on the meanings 
of the ratings. First, they undoubtedly af- 
fect administrative and counseling deci- 
sions about individual pupils. Second, the 
ratings must be clarified as a step toward 
determining their value for educational re- 
search. 


METHOD 


The trait ratings of students used in 
this study were made in 1955 and 1956 
by the public high school faculty of a 
Connecticut town. As part of the report 
sent to parents, students are regularly 
rated on the following traits: 


Concern for others 
Responsibility 
Self-control 


Seriousness of pur- 


pose 
Industry 
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Neatness and per- 
sonal appear- 
ance 


Initiative 
Influence 


The ratings are made on graphic scales 
with adjective descriptions at five equal 
intervals on the scale. The school proce- 
dure is for teachers to make their ratings 
on the same card. However, in an effort 
to reduce contamination the teachers sub- 
mitted the ratings used in this study in- 
dependently. 

In May of 1955 ratings were obtained 
for the three classes then in attendance. 
The numbers of pupils in the sophomore, 
junior, and senior classes were 76, 89, and 
71, respectively. Each class had about an 
equal number of boys and girls. In May 
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of 1956 ratings were again obtained. This 
time teachers were also asked to indicate 
the amount of confidence they felt in the 
validity of each rating utilizing a graphic 
rating scale. To permit treatment all rat- 
ings were converted to equivalent numeri- 
cal scores ranging from one to five. 

Of the 72 juniors rated in 1956, 56 had 
also been rated as sophomores the previous 
year. Nine teachers participated in the 
1955 ratings of this class and 11 the fol- 
lowing year, six participating both years. 
In most cases a student received ratings 
from five or more teachers on each trait. 


REsvutts 
Estimates of reliability were obtained by 
correlating the average of half the judges’ 


TABLE 1 
ReviaBitities oF Trait Ratines anp Tuer INTERCORRELATIONS* 


(For Sorpnomores, N = 


Trait (1) | (2) 





Seriousness 
Juniors (.78) 
Industry 
Sophomores 
Juniors 
Initiative 
Sophomores | .88 
Juniors 89 
Influence 
Sophomores 82 
Juniors .79 
) Concern 
Sophomores 34 
Juniors 83 
Responsibility 
Sophomores 86 
Juniors 86 
(7) Self-control 
Sophomores 62 
Juniors .73 
(8) Neatness 
Sophomores .64 
Juniors .70 
(9) Composite 
Sophomores .90 ‘ 
Juniors 94 91 


93 
.90 


| 
Sophomores | (.78) 
| 








75; ror Juniors, N = 89) 








(4) (S) (6) (7) 





(.69) 
(.62) 


. : -76 74 
-93 ‘ -85 80 




















* Correlations in parentheses along the diagonal represent reliabilities of trait ratings, i.e., average of half the 


lated with 





ining half and adjusted by Spearman-Brown formula for double length. 
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TABLE 2 


1956 Mean Trait Ratines 


Mean Ratings 





Boy Cire | 
(N = 28) | (N = 28) | ¢ 
rer F-s 


M bie 
3.56 3.78) .12 
3.30 "19 3.65 .14 

3.21; .16 3.47) .13 

.12, 3.26) .09 
-10 3.49) .07 
.14 3.64) .08 
-11 3.85) .09 
.09 4.18) .07 





a 
Seriousness 
Industry 
Initiative 
Influence 
Concern 
Responsibility 
Self-control 
Neatness 
| 


3.13 
3.26 
2.97 
2.71 
3.31 


3.46 
3.53 
3.51 


AND RELIABILITIES FOR Boys, 
_ Gris AND NoNAcaDEMIC Sru DENTS" 


Reliability of Ratings* 





| Non- 
rs 
N = 16) 


EE M | sow M | spy 
| 
' 


20) 3. 
17) 3 
.16) 3. 

| .12) 3 

| .05 3 

.14| 3.57) .07 


-09 


Nonac- 4 
ademics | 


Girls 
(NW = 28) |(N = 28) W="0)| 
| 


82 

79 
.78 .72 
3.07, 07} .85 | .74 | .85 
41 05} .44 | .38 | .52 i 


ae 
boiaed 

55.09.74 | 85 
43) 10) .82 | “82 
26.09, .74 | 





68 | .57 60 
72 | .74 
.62 


06, .73 | .73 


06.70 | 31 


14, 3.76 
3.90 


8 ‘aan of half the ratings correlated with remaining half ond adjusted by Pe Sues Seonaia 7 double 


length. 


ratings against the average of the remain- 
ing half and corrected by means of the 
Spearman-Brown formula. Table 1 shows 
the estimates for the 1955 ratings of two of 
the classes. Also shown in Table 1 are 
the intercorrelations among the 1955 rat- 
ings for the sophomore and junior classes. 

On the basis of the 1956 ratings, esti- 
mates of reliability for the junior class 
were calculated separately for the boys, 
girls, and nonacademic students. These re- 
sults are presented in Table 2. With one 
exception correlation size was unrelated to 


sex or course of study. The uncorrected 
coefficient for the academic girls on neat- 
ness was only .18. The distribution of 
neatness ratings for this group was com- 
pressed on the high end of the scale. 
The ratings given to the sophomore 
class in 1955 were correlated with the 
ratings received by the same class the 
following year. Table 3 presents these re- 
sults. It is seen that the 1956 rating for a 
particular trait is predicted about equally 
well by 1955 ratings on several other 
traits. In a number of instances the latter 


TABLE 3 


CorRELATIONS BETWEEN 1955 aNv 1956 Trait RatTINnGs 
(V = 56) 








1956 ‘Trait Ratings 





1955 Trait Ratings (2) (3) 


(4) (S) (6) 





-66 
-68 


(1) Seriousness .67 
(2) Industry ; 71 
(3) Initiative .60 .63 
(4) Influence ; .56 .59 
(5) Concern ‘ .57 .57 
(6) Responsibility 66 61 
(7) Self-control 34 .29 
(8) Neatness .38 .33 
(9) Composite ; 61 .65 


69 71 
56 -56 .57 
54 47 54 
61 -56 46 
52 42 46 
-47 51 .60 
17 .33 -36 
ll 42 41 
-56 53 54 


.57 
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have superior predictive value. There is, 
for example, the curious finding that “1955 
seriousness of purpose” correlates .79 with 
“1956 self-control,” while “1955 self-con- 
trol” correlates only .33 with “1956 self- 
control.” 

For the junior class of 1956, both the 
ratings given that year and the expressions 
of confidence were subjected to analyses 
of variance. Significant differences (at less 
than .01 level of confidence) were found 
between traits and between raters. Aver- 
age trait ratings ranged from 3.08 for in- 
fluence to 3.91 for both self-control and 
neatness. The range for average ratings 
given by individual teachers on all traits 
was 3.04 to 4.40. 

The expressions of confidence for the 
same group also showed significant differ- 
ences among both traits and raters. Jnflu- 
ence was rated with the least confidence, 
having a mean of 3.45. The largest amount 
of confidence was placed in neatness where 
the mean rating was 4.11. Differences in 
mean confidence among teachers were wide 
with a range of from 2.86 to 5.00, the 


latter average expressing complete confi- 
dence in the validity of all ratings made. 
The mean confidence rating for all teach- 
ers combined was 4.03. 


Discussion 


The estimates of reliability, when allow- 
ances are made for the varying numbers of 
raters and the differential use of the Spear- 
man-Brown Formula, are similar to those 
summarized for previous studies (2, 9). 

The intercorrelations between traits are 
remarkably high with the two classes pre- 
senting patterns that are strikingly simi- 
lar. The average difference between corre- 
sponding cells in these samples is less than 
05. Although there are some differences 
in the trait names used, the results would 
seem comparable with those of Kornhau- 
ser (4, 5) and Chi (2). The former ob- 
tained high intercorrelations among trait 
ratings of college students made by in- 
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structors, whereas Chi obtained similar re- 
sults for teacher ratings of elementary 
school pupils. As in the present study, 
Kornhauser found instances of a trait’s 
correlation with another trait to be higher 
than that trait’s own reliability as esti- 
mated by interjudge agreement. The inter- 
correlations between the 1955 and 1956 
ratings supply additional evidence for the 
lack of specificity among trait ratings. 

An interesting, and perhaps important, 
aspect of the rating process is the rela- 
tively high degree of confidence in the va- 
lidity of the ratings held by the teachers. 
They average slightly higher than the scale 
point: “Almost certain rating is accurate.” 
Terman’s (10) raters had also expressed 
considerable confidence in their ratings, 
averaging for 25 traits between “fairly 
certain” and “very certain.” 

It seems clear that teachers’ ratings do 
not represent the discrete patterns of be- 
havior that might be implied by the vari- 
ous trait names. Very likely, 100 different 
trait names could produce similar patterns 
of correlation. It will no longer do to rec- 
ommend increasingly elaborate precau- 
tions to avoid “halo effect.” The present 
ratings were made under highly favorable 
circumstances and with considerable con- 
fidence. Perhaps the hopes of the early 
proponents of rating are unrealistic. 

Lynch’s (6) discussion of the implicit 
assumptions of the rating-scale approach 
suggests that the early goal of valid and 
independent trait ratings was really un- 
attainable. Cattell (1), however, felt that, 
with the use of certain safeguards, trait 
ratings have at least a preliminary value 
in exploring the factors of personality. 
Our own data lack sufficient scope to offer 
much evidence for either view. More prac- 
tically, our concern is with the meanings 
of particular trait ratings under the con- 
ditions with which they can be made in 
the school setting. 

Perhaps the most realistic attitude 
would be: “If this be halo effect let us 
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make the most of it.” Instead of hoping 
for literal measures of traits, this approach 
would seek the meanings of the ratings for 
their possible research or other value. 
There is ample evidence that such trait 
ratings correlate highly with grades at all 
educational levels. It may be that the rat- 
ings represent very little more than imper- 
fect reflections of academic performance— 
ie., they may be given “after the fact” of 
grades. Yet, at least to some extent, trait 
ratings may tap personality variables that 
do affect academic achievement. In any 
event, investigative use of trait ratings 
will be improved by regarding them more 
realistically. 


SuMMARY 


An examination was made of high school 
teachers’ trait ratings of students, three 
classes being used as replications. Inter- 
correlations among traits were high, thus 
offering evidence for the interchangeability 
of trait names. It was suggested that the 
goal of separate and valid trait ratings 
might be unrealistic. Instead of pursuing 
this goal, it might be well to accept the 
fact that teachers rate on the basis of gen- 
eral impression. The behavior patterns 
underlying the general impression should 
be explored. 
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During the past four years, Sarason and 
his colleagues have been developing a scale 
to measure the anxiety which children ex- 
perience with respect to school examina- 
tions. From the research which they have 
conducted thus far (2, 3, 4, 7, 8, 9, 11), 
the following findings seem most directly 
to support the reliability and validity of 
that test anxiety scale (to be described 
briefly later in this article): (a) The split- 
half coefficient of reliability of the scale is 
81 for a sample of randomly selected chil- 
dren in Grades 1, 2, and 3; (6) For chil- 


* The project of which this study is a part 
is being supported by a grant from the U. 8. 
Public Health Service. We wish to thank 
Joseph A. Foran, Superintendent of Schools, 
Irving Zweibelson, School Psychologist, and 
the teachers and staff of the Milford, Con- 
necticut Public School system for their genial 
cooperation in all phases of the project; Mrs. 
John Keim and Doris Kraeling for their aid 
in collecting data; and Robert P. Abelson of 
our Department for his stimulating guidance 
in matters statistical. 

The English data could not have been ob- 
tained without the technical aid and moral 
support of P. E. Vernon of the Institute of 
Education, University of London and M. F. 
H. Butcher, Borough Education Officer, Hen- 
don, London. We wish to express our grati- 
tude to these men and to the teachers, head- 
masters, and pupils of Schools X and Y, 
people who, owing to considerations of con- 
fidentiality, must remain anonymous. Thanks 
are also due to Paul Tacon, a postgraduate 
student at University College London, who 
helped in the scoring and statistical analysis 
of the English data. Finally, we should like 
to thank Roger W. Russell, Executive Secre- 
tary of the American Psychological Associa- 
tion, who, in his former capacity as Chairman 
of the Psychology Department, University 
College London, made office space and cleri- 
cal facilities available to us throughout the 
summer of 1956. 
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dren in Grades 1 through 5 there is a 
significant and negative correlation (aver- 
aging —.23) between the test anxiety scale 
and IQ (Otis Alpha and Beta); (c) There 
is a significant and positive correlation be- 
tween teachers’ ratings of pupils’ anxiety 
and the pupils’ score on the test anxiety 
scale. (d) The more game-like the atmos- 
phere in which the test is administered, the 
less apparent are the interfering effects of 
anxiety. 

Since the process of validating a measure 
of so complex a psychological construct as 
anxiety must necessarily be a continual 
one, we decided to conduct the cross- 
cultural study which is reported here. This 
study permitted us to investigate: (a) 
the extent to which certain correlates of 
the test anxiety scale were similar in a 
culture which is quite different from our 
own and (0) the effects upon test anxiety 
of a school examination, the English 
“eleven plus” examination, whose power 
to determine a child’s educational future 
has no counterpart in our culture. 


RESEARCH SITES 


Milford. Since good liaison and working 
relationships had already been established 
between our research team and the educa- 
tional officials of the public school system 
there, Milford seemed to be a most de- 
sirable locus for further research on the 
children’s forms of the test anxiety scale. 
One of our major research objectives in- 
volved the simple exploratory task of de- 
scribing how scores on the test anxiety 
scale varied with sex, grade, general anxi- 
ety, and lie status (to be explained below). 
It is this part of the research which we 
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now report in connection with a compa- 
rable treatment of data from the same 
scales which were administered in England. 

London. Several considerations guided 
our choice of London as a place in which 
to conduct a comparative study. First, 
we wanted to obtain a foreign sample be- 
cause we felt that, by studying children of 
another culture on the same dimensions as 
children of our own culture, we might be 
able to learn which relationships, if any, 
between test anxiety and other traits (e.g., 
sex and grade) were independent of cul- 
tural influence—at least insofar as it may 
be legitimate to classify England as a cul- 
ture which is significantly different from 
our own. 

A second reason for selecting England 
as a research site stemmed from the very 
practical fact that we would be confronted 
by a minimal language barrier there in the 
administration and comprehension of our 
questionnaire. This relative absence of a 
language barrier was an especially perti- 
nent consideration in view of our intention 
to work with children who might be as 
young as five years. 

Third, because of a unique English ed- 
ucational practice, the “eleven plus” ex- 
aminations, pupils in English state-sup- 
ported Primary Schools seemed to be an 
ideal population from which to obtain data 
bearing upon the validity of the Test 
Anxiety Scale. At the same time, we felt 
it might be possible systematically to 
study the impact of culturally different ex- 
amination pressures upon test anxiety. 

The “eleven pluses” are a battery of 
examinations which most English children 
take in their last year at Primary School. 
Much of the child’s future rests solely upon 
the outcome of these examinations. If he 
passes them, the government stands ready 
to underwrite the cost of a grammar school 
education for him. And grammar schools 
provide a rich, varied, and intensive liberal 
arts education—the sort of background 
which best prepares a child for a univer- 
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sity curriculum. However, even if a gram- 
mar school graduate does not move on to 
the university, he is regarded by the 
English public as a decently educated 
person, someone who can qualify for much 
better jobs than if his secondary educa- 
tion were of the sort soon to be described. 
Thus, by passing the “eleven pluses,” a 
child may often be well on his way to a 
rise of one notch or more in the socio- 
economic hierarchy of English life. 

The English child who does not pass the 
“eleven pluses”, however, faces consider- 
ably darker prospects. Unless his parents 
have money to send him to a private sec- 
ondary school, an extremely unlikely pos- 
sibility in the overwhelming majority of 
cases, the failing child is sent to one of the 
secondary modern schools. For the most 
part, the level of education offered at these 
secondary modern schools is markedly in- 
ferior to that found in grammar schools. 
Generally speaking, they offer a “watered 
down” version of the liberal arts curricu- 
lum of the grammar schools. Moreover, the 
motivation toward intellectual pursuits in 
secondary modern schools is likely to de- 
crease as the child realizes that he has 
been relegated to an educational group 
which is considered inferior to the gram- 
mar school population. In any case, when 
they reach the age of 15, the minimum age 
at which they can leave school, many of 
the children from secondary modern 
schools go out to join the labor force with 
much less prospect of upward mobility 
than their grammar school counterparts. 


PROCEDURE 


Variables and Measures 


The Test Anxiety Scale. Insofar as the 
differing circumstances permitted, identical 
measures were obtained on both the Amer- 
iean and English samples. Our measure of 
test anxiety was a revised and shortened 
version of the Test Anxiety (TA) ques- 
tionaire which is described elsewhere (7). 
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On the basis of item analyses, the original 
42-item TA questionnaire was reduced to 
30 items, and this was the instrument used 
in the present study. Two items from the 
final version of the TA questionnaire are 
presented below for illustrative purposes: 

“1. Do you sometimes dream at night 
that you are in school and cannot answer 
the teacher’s questions? 

“2. Do you worry a lot before you take 
a test?” 

The General Anziety Scale. The other 
type of anxiety which we measured was of 
a more general type and did not focus upon 
reactions to testing or test-like situations. 
Instead, the General Anxiety (GA) Scale 
covers a wide sampling of the kind of 
worries, fears, concerns, and psychosomatic 
accompaniments of anxiety which children 
may feel generally or in connection with a 
variety of situations. Like the TA ques- 
tionnaire, the GA Scale had been pretested, 
and the version which we used for our 
comparative study consisted of 34 items. 
The following two items from the GA 
questionnaire illustrate the type of content 
which the scale covered: 

“1. Do you get scared when you have to 
walk home alone at night? 

“2. Do you worry that you might get 
hurt in some accident?” 

The Lie Scale. Contained with the GA 
Scale, but tabulated separately, were 11 
items which shall be referred to as the Lie 
Seale. We included this Lie Scale in an at- 
tempt to evaluate the internal validity of 
the Ss’ responses to the TA and GA ques- 
tionnaires. In addition, we were interested 
in studying “lie status”—a two-fold clas- 
sification derived from scores which fell 
above and below the median Lie Seale 
scores within each country-sex-grade 
group. The items in the Lie Seale consisted 
of positive assertions of fears or anxieties 
which are so universal that anyone denying 
ever having had such feelings could rea- 
sonably be presumed to be lying, or at 
least to be defending himself against the 
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awareness or expression of a psychological 
fact. Examples of assertions of this sort 
may be found in these two items from the 
Lie Scale: 

“1. Do you ever worry about what is 
going to happen? 

“2. When you were younger were you 
seared of anything?” 

Other variables related to the TA Scale: 
Grade and Sex. For the most part, the 
correlates of anxiety with which we were 
concerned referred to the sex and age of 
the S, the latter variable being operation- 
alized in terms of grade level at school. 
Data pertaining to the sex and grade of the 
S were obtained directly in the course of 
administering the questionnaire. 

To sum up, the results which we shall 
report here consist of the interrelations 
among the following variables: test anxi- 
ety, general anxiety, lie status, sex, and 
grade in school. 


The Administration of the Questionnaires 


In the Spring and early Summer of 
1956, our data were collected simultane- 
ously in Milford and London. Identical 
testing procedures and instruments were 
used in both countries. The data were 
gathered in the various classrooms by 
trained female research assistants while 
the regular classroom teacher was out of 
the room. Our measures were presented in 
the following sequence: TA, a drawing 
task, and GA. The drawing task was in- 
troduced between the two anxiety scales 
in an effort to minimize the development 
of a response set. We also wished to use 
the drawing material for an exploratory 
study (4) on the relationships between 
body-image and anxiety. The drawing task 
required the children to draw first a man, 
then a woman, and, finally, a house. 

To ensure comprehension of the anxiety 
scales, all of the questionnaire items were 
administered orally to every class. The Ss 
were provided only with answer sheets 
upon which were printed a series of num- 
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bers corresponding to the questionnaire 
items. Printed next to each number on the 
answer sheet was a Yes and No alter- 
native. The assistant read each item aloud 
and then paused to permit Ss to encircle 
an alternative for the appropriate number 
on the answer sheet. 


The Samples 


After Ss had been divided into their 
respective country-sex-grade groups, me- 
dians for these 16 groups on the Lie Scale 
were found. The Ss were then further di- 
vided into high and low “liar” groups 
based on these medians. Ten Ss were drawn 
randomly from all but one of the 32 groups 
thus formed. That one group (US, High 
Lie, fourth-grade boys) contained 10 pu- 
pils, all of whom were used as Ss. All Ss 
were drawn from a pool of 1130 pupils— 
533 British, and 597 American. Group n’s 
before sampling averaged 35.3, varying 
from 10 to 55. 

The American subjects. The American 
Ss were drawn from four grades in six 
schools of the school system of Milford, 
Connecticut, an expanding residential com- 
munity of 38,000 population. Milford is 
perhaps an extreme example of a type of 
town in Connecticut along the shore: 
though not an industrial center itself it has 
become the site of residential developments 
which house those who commute by auto- 
mobile or rail to the centers of New Haven, 
Bridgeport, Norwalk, Stamford, and New 
York City. The six schools chosen ap- 
peared to include pupils from the entire 
range of socioeconomic classes in Milford. 

The English subjects. Our English data 


*Tables showing n’s and within-group 
variances for the 32 groups are on file at the 
American Documentation Institute, Photo- 
duplication Service, Library of Congress, 
Washington 25, D. C. Order Document 
No. 5602, remitting $1.25 for 35-mm. micro- 
film or $1.25 for 6 by 8 in. photocopies. 
Make checks or money orders payable to: 
Chief, Photoduplication Service, Library of 
Congress. 
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were collected in two state-supported Pri- 
mary Schools which are located in the 
Borough of Hendon, a residential area in 
North London. One of these schools, School 
X, is situated in a distinctly well-to-do 
neighborhood, and about 50% of its pu- 
pils are from middle- and upper-middle- 
class families. The other school, School Y, 
caters largely to a lower-middle and skilled 
working class population. We felt that, 
in combination, these two English schools 
represented a range of social classes which 
was as equivalent to our American sam- 
ple as could be arranged within the limits 
of practicality. 

Within these two Primary Schools, we 
covered all classes from Grade 1 through 
Grade 5. In terms of age, pupils in these 
grades are similar to American children 
in the same grades. For example, Grade 
1 is populated largely by 6-year-olds; 
Grade 2 by 7-year-olds, etc. 

Although we were fairly successful in 
matching our English and American sam- 
ples with regard to grades, sex, and social 
class, it should be noted that certain dif- 
ferences in institutionalized educational 
practices precluded a completely com- 
parable matching. Thus, for example, the 
English child tends to be exposed to the 
alphabet and to reading somewhat earlier 
than the American child. This exposure 
often occurs in Infant School; that is, 
when the child is five (or even four), 
and is attending the equivalent of our 
kindergarten. Moreover, one gets the gen- 
eral impression that English teaching, as 
compared to our own, tends to put more 
stress on the learning of subject matter 
than on the attainment of social adjust- 
ment. Hence, and especially in the classes 
to which brighter children are assigned, 
the atmosphere tends to be considerably 
more consciously “intellectual” than is 


*Since there was no comparable fifth 
grade group included in the American sam- 
ple, we omitted the English fifth-graders from 
the analysis of results in the study. 
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typical of most classrooms in American 
publie schools. 


HyYPpoTHesEs 


Since school examinations would appear 
to be so much more important to a child’s 
future in England than in America, we 
would expect a generally higher level 
of test anxiety in English children than 
in their American counterparts. Our 
principal hypothesis regarding test anx- 
iety scores is, therefore, that the English 
TA mean will be significantly higher than 
the TA mean for American Ss. In con- 
trast to this prediction regarding TA 
scores, there is no reason to expect dif- 
ferences in GA means between countries. 

A second hypothesis is that there will 
be a general rise in test anxiety (and 
again, not in general anxiety) as grade 
level increases. This is based on the as- 
sumption that as examinations increase in 
number and as academic performance is 
evaluated with increasing discrimination, 
which is assumed to be the case as grade 
level increases, anxiety about performance 
on tests and in test-like situations will 
increase. 

Because both cultures condone freer 
emotional expression in girls than boys, 
we hypothesize that girls will have higher 


anxiety scores than will boys. Finally, 
we hypothesize that high “liars” will be 
defending against admitting anything ad- 
verse about themselves and will there- 
fore have lower TA and GA scores than 
will low “liars.” 


RESULTS AND INTERPRETATION 


The TA Scale. Means for each group 
are presented in Table 1. Employing the 
F test for homogeneity of variance dis- 
cussed and tabled in Walker and Lev 
(10:192) we find F,,,, equal to 54.01/ 
6.54, or 8.26, where the .05 level requires 
a value of 10.7. These 32 variances may 
be considered, therefore, homogeneous. 

A summary of the results of the TA 
variance analysis is presented in Table 
2. The principal prediction (i.e., that the 
English TA mean will be significantly 
higher than the American TA mean) is 
confirmed at the .001 level. The sex 
variable also is significant. A glance at 
Table 1 reveals that girls have a higher 
mean TA score than boys. Does this mean 
that girls are actually more anxious about 
tests and test situations than boys? Or 


*The number of variances tabled is only 
12, but a value not significant at k = 12 can- 
not be significant at k > 12. Accordingly F 
values for n = 9 and k = 12 were employed. 
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TABLE 2 


SumMArY TABLE FOR THE ANALYSIS 
oF VARIANCE 


GA Mean 
Squares 


TA Mean 
Squares | 


Source of | | 
Variance | df 


Between Cells 
Country 


| 
616.05***| 8.78 
Sex | 1) 171.11** | 4226.78*** 
Grade | 3 146.19***| 96.64* 
linear 1; 261.06***| 196.70* 
quadratic 1} 23.27 10.88 
residual 1; 154.22* 2.36 
(cubic) 
Lie 1, 2020.05*** 
CXS l 1.80 | 33.15 
CXG 3 57. . 25° 
linear | 1 K 
quadratic 1 
residual 1| 
(cubic) 
CXL 
SXG 
SXL 
GXL 
linear 
quadratic 
residual 
(cubic) 
CXSXG 
CXSXL 
CXGXL 
SXGXL 
CXGXSXL 
Within Cells 
Total 


171. | a8 











*p< 05 
**p < Ol 
“*p < 001 


is our scale merely constructed so as to 
elicit more admissions of anxiety from 
girls than from boys, ie., is it a female- 
biased set of items? Research employ- 
ing classroom observations, Rorschach 
interviews, item factor analysis, and pa- 
rental interviews of high and low anxious 
children is now under way in an attempt 
to provide answers to these questions. 
A proper evaluation of our hypothesis 
regarding TA increment with grade is a 
test of the linearity of trend, upward, over 
grades.” The linear trend over grades is 


® See (1) and (5) methods of trend testing. 
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significant at the .001 level. Since there 
is also a residual TA trend over grades, we 
conclude that although our hypothesis 
of a linear increment in TA over grades 
is confirmed, there seem to be other 
sources of variation in the data indicat- 
ing that the hypothesis is perhaps limited 
in scope. 

Lie status is shown to be a significant 
variable in the analysis, high liars (above 
the country-sex-grade group lie scale 
median) obtaining lower TA scores as 
predicted. It has been suggested that at 
least part of the power of Lie scores to 
differentiate TA means is due to a set 
factor in answering Yes or No to the 
questionnaire. Detailed research on the 
roles in TA, GA, and Lie scores of self- 
other content, acquiescence, and whether 
items are couched in terms socially favor- 
able or unfavorable to the referent (self- 
other) is now being carried out. 

Two unanticipated 
actions were revealed in the analysis, both 
involving grade, both quadratic trends: 
Country xX Grade and Grade x Lie. 
The Country xX Grade quadratic inter- 
action is an indication of difference be- 
tween countries in quadratic trend over 
grades. For the English group, TA scores 
tend to rise to the third grade and then 
fall in the fourth, while American scores 
tend to fall from first to second grade and 
rise from the second to the fourth. At 
no point do the trend lines cross, the 
American scores being consistently below 
the English and the greatest difference 
being in the second and third grades. The 
Grade x Lie quadratic interaction re- 
flects the difference between high and low 
liars’ quadratic trends in TA score over 
grade: the low liars’ trend is up from the 
first to the third grade and down in the 
fourth: the high liars have a very slight 
dip in mean TA from the first to the 
second grade and then rise slightly to 
the fourth. (We draw the reader’s at- 


significant inter- 
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tention to the means presented in Table 
1.) 

Why the grade-by-grade English TA 
means should reach their peak in the 
third rather than in the fourth grade, the 
grade previous to that in which the 
eleven plus examinations are taken, is 
something of a mystery.” 

The GA Scale. GA means are presented 
in Table 1. We found F,,, for GA 
variances equal to 97.15/3.51, or 27.68, 
which is significant beyond the .01 level. 
The 32 variances, then, were hetero- 
geneous. The most appropriate trans- 
formation for scores whose means and 
variances are related as they are in these 
data is the Freeman-Tukey transformation 
(6). It was found that this transformation 
did little to alter the heterogeneity of 
variance. Therefore, we decided not to 
transform scores but rather to use raw 
scores and be conservative in our use of 
p values. We considered significant only 
those F ratios attaining the .01 level. 

Thus we found Sex and Lie Status the 
only significant effects in the GA analysis 
of variance summarized in Table 2. Girls 
gave higher scores than boys and low 
liars gave higher scores than high liars, 
as predicted. The linear Grade and linear 
Country X Grade interaction effects ap- 
proached significance at the .025 level. 
This trend, while more pronounced than 
we had expected, is still below the level 
of significance we had established as a 
basis for rejecting our hypotheses. Our 
hypothesis that High Liars would refrain 
from admitting anxiety is supported in 
most convincing fashion, although again, 


* One of the editorial readers of this paper 
offers the suggestion that some of the pe- 
culiarity connected with the grade trends 
may be due to the fact that this was a cross- 
sectional rather than a longitudinal study. 
The different grade groups may be different 
for reasons quite unconnected with grade 
level per se—for example, differing economic 
and post-war factors for the first-graders in 
their preschool years than for the fourth- 
graders. 
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part of this finding may be confounded 
with a set factor. We found encouraging 
evidence for the existence of a difference 
in function being measured by the TA and 
GA Seales in the convincingly small Coun- 
try effect on the GA Seale in contrast to 
its effect on the TA Scale. This difference 
supports our a priori reasoning and hy- 
potheses concerning the differential effect 
of the Country variable on these two 
scales. 

We speculate on the origin of the sex 
difference as follows: in both cultures we 
feel that it is less socially acceptable for 
boys to admit to the worries and fears 
included in the scales than for girls so 
to admit. It is perhaps sufficient to say 
that the terms “sissy” and “booby” refer 
to boys who are seen as both afraid and 
girlish and that boys will generally make 
every effort to deny any suggestion of 
either. In contrast, admission of fears or 
worries is socially acceptable in girls. We 
conclude that these differential roles would 
be sufficient to give rise to differences in 
either TA or GA scores; we hypothesize 
that they are, in fact, the source of the 
Sex effect. 


SuMMARY 


In this study, conducted as part of a 
program which aims to develop a valid 
measure of test anxiety, equivalent groups 
of American and English school children 
were compared on the following measures: 
test anxiety, general anxiety, and a lie 
scale. Because of the relatively greater 
importance of school examinations in Eng- 
land, we predicted that English children 
would have higher mean TA scores than 
their American counterparts. On the other 
hand, we expected to find no difference 
in mean GA scores between the two coun- 
tries. In both countries, it was predicted, 
in line with the assumption that the im- 
portance of school examinations to chil- 
dren increases with grade, that there 
would be a significant positive correlation 
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between TA and school grade. GA scores, 
on the other hand, were expected to be 
more independent of grade level. Because 
they were presumed to defend more 
drastically against the admission of fear, 
Ss scoring high on the Lie Scale were 
expected to have lower TA and GA means 
than those who were Low in the Lie Scale. 
Finally, since it is more socially acceptable 
for girls than boys to express fear and 
distress, we predicted that girls would 
have higher TA and GA means than 
boys in America as well as in England. 

Our findings confirmed each of the 
foregoing hypotheses. Accordingly, the 
results may be viewed as lending further 
support to the validity of the measures 
upon which they were based. 
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CONCEPT LEARNING IN MENTAL DEFECTIVES AS A FUNCTION 
OF APPROPRIATE AND INAPPROPRIATE “ATTENTION SETS” 
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In his exhaustive studies of set, Harlow 
(5) obtained results which indicate that 
this phenomenon is of major importance 
in learning. A particular variety of set— 
labeled an “attention set” or “perceptual 
bias” —is believed by some to play an im- 
portant role in the behavioral dynamics 
of the emotionally disturbed. In this con- 
nection, Wickens (9) describes a type of 
perceptual activity commonly found in 
disturbed individuals. This behavior in- 
volves a tendency on the part of the pa- 
tient to respond to certain aspects of the 
environmental situation and to ignore 
other aspects. Out of all the stimulus 
components of a social situation, the in- 
dividual may respond only to those events, 
for example, which indicate some criticism 
of himself. He may even distort his per- 
ceptions so that they fit into his inap- 
propriate schema. Successful therapy in 
such a case would at least in part involve 
altering the patient’s inaccurate percep- 
tions of the environment. Wickens takes 
the position that these perceptual proc- 
esses are molar phenomena which can be 
predicted from certain molecular S-R 
postulates. 

Eckstrand (3) investigated the phe- 
nomenon of perceptual bias or attention 
set by means of a study of learning in a 
group of air cadets. He had three groups 


* The data upon which this paper is based 
are taken from a dissertation submitted by 
the writer to the Graduate School of the 
George Peabody College for Teachers in 
partial fulfillment of the requirements for 
the degree of Doctor of Philosophy. The 
writer wishes to express his deep gratitude 
to Gordon N. Cantor for his guidance and 
encouragement. 

*Now at Arkansas Child Development 
Center, Little Rock. 


of 40 Ss each learn a discriminative motor 
task which required the association of four 
keys with four differently colored forms. 
In this phase, both color and form were 
relevant cues. Prior to this learning, each 
of the groups had been given a different 
kind of pretraining. In the first phase of 
the pretraining, group A learned a task 
requiring use of the concept of form; 
group B learned a task requiring use of 
the concept of color; group C learned a 
task in which both form and color con- 
cepts were utilized. In the second phase 
of the pretraining, each group learned a 
task in which both the form and color 
aspects of the stimuli were appropriate 
in learning the task. In the test task, each 
of the groups was divided into two sub- 
groups. In each case, one subgroup learned 
a problem with the forms but not the 
colors from the second pretraining test 
task present, while the second subgroup 
learned a problem with the colors but not 
the forms of the second pretraining task 
present. The results indicated that cue 
attention habits established during pre- 
training transfer to the learning of later, 
similar tasks. Buss (1, 2) and Kendler 
and D’Amato (7) have conducted studies 
which demonstrated somewhat related 
types of phenomena. 

Kendler and D’Amato utilized a media- 
tion hypothesis in arriving at their pre- 
dictions. Their hypothesis involved the 
notion that one builds up implicit re- 
sponses during pretraining which influence 
his performance on a subsequent task. For 
example, if the pretraining task involves 
the concept of color, it is assumed that the 
S learns an implicit response of the 
nature, “It’s color that’s important.” 
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When the transfer task stimuli are pre- 
sented, it is assumed further that these 
stimuli elicit the previously learned im- 
plicit verbal response. This implicit re- 
sponse has cue properties which are be- 
lieved to affect performance on the 
transfer task. 

Previous studies in this area have uti- 
lized children and young adults of average 
or above average intelligence as Ss. The 
present study was carried out to investi- 
gate further the utility of the attention 
set concept, and also to furnish additional 
information about the learning processes 
of mental defectives. The problem was 
designed to investigate the influence of pre- 
training on subsequent learning when: (a) 
the set employed in pretraining is ap- 
propriate or inappropriate for a transfer 
task; (6) differing amounts of pretrain- 
ing under these two conditions of appro- 
priateness are given; and (c) the set 
employed in pretraining is neither appro- 
priate nor inappropriate for the transfer 
task. It was predicted that the set em- 
ployed in the pretraining which was ap- 
propriate for the subsequent learning task 
would facilitate performance on that task, 
whereas the set which was inappropriate 
would have an interfering effect. It was 
expected further that a relationship would 
be found between amount of facilitation 
or interference and amount of pretraining. 


METHOD 


Subjects. The sample consisted of 60 
Ss, who were mental defectives drawn 
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from the population of male patients at 
the Lincoln State School in Lincoln, 
Illinois. The Ss selected ranged in age from 
15 to 29 yr. The IQ range was 40 to 56, 
based on the Revised Stanford-Binet, ad- 
ministered within the past three years. 
The Ss selected had been diagnosed by 
the medical staff as having mental de- 
ficiency due to a familial or undifferenti- 
ated (idiopathic) etiology. No S was in- 
cluded who exhibited an observable motor 
defect or who had a history of convulsions. 
Twelve Ss were randomly assigned to each 
of five treatment groups. Table 1 shows 
the means and standard deviations for 
the CA’s and IQ’s of each of the five 
groups. 

Apparatus and procedure. The appa- 
ratus used in the pretraining problem 
consisted of a 2 ft. square base with a 2 
ft. square vertical panel erected in the 
center of the base. In the middle of the 
vertical panel, a 3 x 5 in. piece of frosted 
glass was mounted. Behind this glass, a 
single white seven-watt bulb was located. 
On the section of the base facing the S 
there were three flush-mounted switches, 
arranged in a semi-circular fashion. The 
E could, by means of a switch mounted 
on the rear of the vertical panel, set any 
one of the three switches placed before 
the S to turn off the light. Wooden but- 
tons, 2 in. in diameter, were placed over 
these switches and were numbered 1, 2, 
and 3. Just below the glass, in the ver- 
tical panel, there was located an opening 
measuring 3 x 5 in. Metal guides were 





















































TABLE 1 
Means AND SD’s ror CA’s anv IQ’s or THE Five TREATMENT GrovPs 
Groups 
I II Il IV Vv 
Measure | Color-Criterion |Color-Over-learning| Form-Criterion |Form-Over-learning) Control 
M SD M SD M SD M SD M SD 
CA 19.83 | 4.90 | 19.83 | 4.53 | 20.92 | 4.91 | 21.08 | 5.21 | 19.92 | 4.36 
IQ 46.83 | 5.10 | 46.50 | 4.36 | 46.92 4.78 | 47.92 | 4.17 | 46.92 | 3.42 
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placed above and below the opening so 
that a 3 x 5 in. card could be slipped into 
the guides and become visible to the S. 
The S’s task was to learn to associate 
each of the three buttons with a concept 
which was presented either verbally by the 
E or pictorially on a card placed in the 
opening. 

In the pretraining phase, the experi- 
mental groups (Groups I, II, III, and 
IV) learned a concept presented pic- 
torially. The stimuli consisted of nine 
colored forms. Three forms were used— 
circles, squares, and triangles; each of the 
stimuli of any one form was painted one 
of three different colors—red, blue, or 
yellow. Hence, each color occurred on each 
of three forms. These cards were placed 
in the opening of the apparatus one at a 
time, the light was turned on, and the S 
was instructed to look at the picture and 
to push each button until that one was 
pressed which turned off the light. The Ss 
were required to associate each button 
with a different color or a different form. 

Experimental Groups I and II learned 
to respond to the color aspect of the 
stimuli. The Ss in Group I were run on 
the task until they were able to respond 
correctly to one complete series of the nine 
stimuli without an error; this group was 
called the Color-Criterion Group. Group 
II learned the same task as Group I, but 
was given an additional 27 trials; this 
group was called the Color-Overlearning 
Group. 

Groups IIT and IV learned to respond 
to the form aspect of the stimuli. Group 
III was carried to the criterion of one 
series of nine trials without an error and 
was called the Form-Criterion Group. The 
Ss in Group TV received the same task 
as Group IT]. but were given 27 additional 
trials; this group was called the Form- 
Overlearning Group. Prior to carrying out 
the experiment, the color and form tasks 
were equated for learning difficulty in a 
series of pilot studies. 
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The Ss in Group V (Control Group) 
learned to associate animal names, pre- 
sented orally by £, with the buttons on 
the apparatus described above and were 
given neither a color nor a form set. The 
labels “collie” and “hound” were as- 
sociated with Button 1, “holstein” and 
“jersey” with Button 2, and “sparrow” 
and robin” with Button 3. The number of 
trials given these Ss was determined by 
computing the average number of trials 
to criterion taken by a preceding S from 
Groups I and III. This gave Group V ap- 
proximately the same amount of pre- 
training experience as the average for 
Groups I and III. 

Since it was found in preliminary work 
that some Ss could not learn these tasks, 
a maximum of 144 trials was set for the 
experimental groups. If the S could not 
learn the task within this limit, he was 
discarded and another S substituted for 
him. On this basis, there were four Ss 
from the Form Groups and three Ss from 
the Color Groups who could not learn the 
pretraining task. 

Following the pretraining, all five groups 
were given the same learning task. Figure 
1 presents an approximate reproduction 
of the nine stimuli used in the transfer 
situation. In order to make this a rela- 
tively difficult problem, three classes of 
forms were selected (stimuli with rounded 
contours, with jagged contours, and with 
a “half-moon” shape) and the individual 
members within each class were made to 
differ somewhat from one another. In ad- 
dition, different shades of three colors 
(green, brown, and gray) were utilized. 
It will be noted that all these forms and 
colors differ from those used in the pre- 
training tasks. 

The transfer task involved paired-as- 
sociates learning, each of the three types 
of form being paired with one of three dif- 
ferent nonsense syllables. The S was told, 
“T have made up some words to go with 
these pictures. I want you to learn the 
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name that goes with each picture.” He 
was then shown one picture representing 
each of the three basic forms and told 
the name of the picture. Nonsense syl- 
lables which were found by Glaze (4) to 
have 100% association value were used; 
these were “VEC,” “BEV,” and “HOB.” 
The pictures were presented manually 
one at a time to the S at a rate of one 
every five sec. If the S was unable to an- 
ticipate the correct name after four sec. 
of exposure, the E pronounced the ap- 
propriate name. If the S did correctly 
identify the picture within four sec., he 
was told, “That is correct.” At the end 
of each presentation of the nine stimuli, 
the cards were rearranged according to a 
predetermined semirandom order. The 
number of trials given to all Ss on the test 
task was 108. 

Predictions. On the basis of a mediation 
hypothesis, one would expect the groups 
to carry implicit responses learned in pre- 
training into a new situation similar to the 
pretraining arrangement. Since the trans- 
fer task was one involving the use of a 
form concept, the Color Group Ss would 
be expected to experience negative trans- 
fer, their mediating response being inap- 
propriate for the new problem. The Form 
Group Ss would be expected to experience 
a positive transfer effect, since for them 
the new stimuli should elicit the appro- 


priate mediating response. Any mediating 
responses learned by the Control Group 
would be expected to be neither facilitating 
nor interfering, since such responses pre- 
sumably involved neither a color nor a 
form concept. 

On the basis of Hullian behavior theory 
(6, p. 29), one would predict that a rela- 
tively large amount of pretraining given 
to some of the Form Group Ss should be 
more facilitative than a smaller amount 
of training given to the remaining Form 
Group Ss, because of greater strength of 
the appropriate mediating habit. Con- 
versely, one would predict that, if a rela- 
tively large amount of pretraining were 
given to some of the Color Group Ss, the 
negative transfer effect due to the inap- 
propriate mediating response would be 
greater than in the case of Color Group Ss 
given less training. Specifically, it was 
predicted that the groups would perform 
in the following order of proficiency on the 
test task, listed from best to poorest: 
Form-Overlearning (Group IV), Form- 
Criterion (Group III), Control (Group 
V), Color-Criterion (Group I), and Color- 
Overlearning (Group II). 


REsvu.ts AND Discussion 


Pretraining tasks. Table 2 presents the 
means and standard deviations involved 
in the trials-to-criterion measure and the 
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TABLE 2 
Means aNv SD’s ror TRIALS-TO-CRITERION AND ERRORS FOR THE 


Five TREATMENT Grov PS 


ON THE PRETRAINING TASKS 





Groups 





II 
Color masa 


I 
Color-Criterion 





M | 


SD 


Ill IV oI Vv 
Form-Criterion Form-Over-learning Control 





M SD 
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Trials-to- 96.75 | 18.85 | 96.00 | 21.14 


Criterion 
Errors 





11.86 | 48.83 | 15.61 





| 48 .33 


| 96.00 


32.56 | 96.25+) 


| 
| 
| 19.62 | 40.25 


24.97 90.00 16.17 


43.42 | 20.26 | 42.67 16.38 





® This mean is the result of the matching technique described in the text. 


error score for each of the treatment MLindquist “Type I” analysis of variance 


groups. One trial was counted each time 
a single stimulus was presented. By an 
“error” is meant the pushing of an in- 
correct button prior to pushing the cor- 
rect button when a stimulus was pre- 
sented. As previously indicated, the 
Control Group learned to respond to 
orally presented stimuli. The number of 
trials given these Ss was determined by 
computing the average number of trials 
to criterion taken by a preceding S from 
Groups I and III. 

The scores made by the four experi- 
mental groups on the pretraining task 
were analyzed by means of analysis of 
variance. It was found that the groups 
did not differ significantly (at the .05 
level) from each other in the number of 
trials taken to reach the criterion or the 
number of errors made during the course 
of learning. 

Transfer task. The score used in ana- 
lyzing these data was the number of errors 
made in the 108 trials given each S. By 
an “error” is meant either an incorrect 
response or a failure to respond within 
the time limit set. Table 3 presents the 
mean numbers of errors for the groups and 
the standard deviation of each distribution. 

In order to obtain a more precise analy- 
sis, the data for each S were divided into 
12 successive blocks of nine trials each. A 


design (8, Chap. 13) was then applied to 
the data. Table 4 presents a summary of 
this analysis. 

As indicated in Table 4, the treatments 
effect was significant at the .01 level of 


TABLE 3 
Means anv SD’s ror Errors ComMItTrep 
ON THE Test TasSK BY THE Five 
TREATMENT GRouPs 
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63.75| 73.08) 48.83) 50.00, 59.33 
15.49) 16.08) 17.24) 22.92) 16.76 
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TABLE 4 


Summary or ANALYsIs OF VARIANCE Ap- 
PLIED TO Error Data SEPARATED 
InTO TRIAL Biocus 





df 





Mean 


Source Square 


PF 





Between Subjects | 
treatments ry a7 3.81* 
error (b) 

Within Subjects ” 
trials 46.14 i _ 
trials X treat- 2.09) 

seats 8 
error (w) | 











"P< M1 
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confidence. The interaction between trials 
and treatments was not significant. The 
trials effect was significant at the .001 level 
of confidence, indicating that learning had 
occurred. 

Because of the significant treatments 
effect, a series of t tests was run comparing 
each group mean with that of every other 
group. These means are presented in 
Table 3. All of the differences between the 
means were significant (at the .001 level 
of confidence) except for that between 
the Form-Criterion Group (III) and the 
Form-Overlearning Group (IV). This dif- 
ference was not significant at the .05 level. 
The Form-Criterion and Form-Overlearn- 
ing Groups made significantly fewer errors 
on the test task than did the Color-Cri- 
terion, Color-Overlearning, and Control 
Groups. The Control Group made sig- 
nificantly fewer errors than the Color- 
Criterion and Color-Overlearning Groups. 
Finally, the Color-Criterion Group made 
significantly fewer errors than did the 
Color-Overlearning Group. 

The statistical analysis indicated that, 
except for the expectation that the Form- 
Overlearning Group would perform better 
than the Form-Criterion Group, all of 
the predictions were confirmed. It is dif- 
ficult to give a clear-cut interpretation of 
the failure to find a difference between the 
performances of the two Form Groups, 
particularly in view of the difference which 
did result between the performances of 
the Color Groups. The most plausible ex- 
planation is that in this situation the de- 
gree of facilitation provided by an appro- 
priate mediating habit is not equivalent 
to the degree of interference caused by an 
equally strong mediating habit which is 
inappropriate for the task at hand. This 
could be the case because of a “floor” 
effect operating to prevent any further 
decrease in errors beyond the level of per- 
formance achieved by the Form-Criterion 
Group; or, a difference in size of units 
may exist, so that a greater increment in 
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appropriate habit strength would be re- 
quired to produce facilitation comparable 
to the interference produced by a given 
increment in inappropriate habit strength. 

The results obtained appear to offer 
rather clear evidence in support of the 
notion that sets or biases can be set up in 
individuals which cause them to respond 
selectively to certain characteristics of a 
situation and to ignore other character- 
istics. The generality of this phenomenon 
is attested to by virtue of the fact that 
mental defectives served as Ss in this 
experiment. Much information still needs 
to be obtained regarding the manner in 
which such existing biases can be modified 
or eliminated. 


SuMMARY 


This study sought to test predictions, 
based on a mediation hypothesis of trans- 
fer, concerning the effects of pretraining, 
varied both in kind and degree, upon the 
learning of a subsequent task in male 
mental defectives. The predictions in- 
volved: (a) the effect of pretraining upon 
subsequent learning when the set em- 
ployed in the pretraining problem is ap- 
propriate or inappropriate for the trans- 
fer task; (6) the effects of two different 
amounts of pretraining, under these two 
conditions of appropriateness, on the sub- 
sequent learning task; and (c) the effect 
on transfer task performance of pretrain- 
ing designed to institute neither an ap- 
propriate nor an inappropriate set. 

Sixty male mental defectives were ran- 
domly assigned to five groups. On the pre- 
training tasks, Groups I and IT learned 
to respond to the color characteristics of 
visually presented stimuli, with Group IT 
receiving additional training. Groups III 
and IV learned to respond to the form 
characteristics of the stimuli, with Group 
IV receiving additional training. Group V 
learned to respond to orally presented 
stimuli and was given neither a color nor 
a form set. 
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All five groups then learned a new task 
requiring use of the concept of form. It 
was hypothesized that the transfer effect 
would be positive if mediating responses 
assumed to have been learned in pre- 
training were appropriate for the new 
problem and negative if they were in- 
appropriate. It was thus predicted that 
the groups would perform in the following 
order of proficiency on the transfer task, 
listed from best to poorest; Group IV, 
Group III, Group V, Group I, and Group 
II. 

Although no significant difference oc- 
curred between the performances of 
Groups III and IV, all other predictions 
were confirmed, thus offering evidence for 
the utility of the attention set concept, 
even in the case of mental defectives. 
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SOCIAL AND PERSONAL FACTORS 
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The California Psychological Inventory 
developed by Harrison Gough is a rela- 
tively new instrument (2). It was created 
in the hope that it would measure some 
characteristics of personality which have a 
wide and pervasive application to human 
behavior and which are related to the 
favorable and positive aspects of person- 
ality rather than to the morbid and patho- 
logical. The test is a self-report instrument 
intended primarily for use with normal 
adults and adolescents. The function of the 
profile of scores on this test is to give a 
summary picture of an individual, viewed 
from the social interaction standpoint, that 
is, to tell what sort of person he is in the 
everyday common-sense meaning of the 
phrase. 

The purpose of this study is to explore 
the relationships between the California 
Psychological Inventory (CPI) and cer- 
tain other variables, namely: social status, 
intellectual talent, leadership ability, 
friendships, aggressiveness, and  with- 
drawnness in a high school population. 


Test INsTRUMENTs UsED 


The CPI was administered as part of a 
study being carried out by the Quincy 
Youth Development Project of the Uni- 
versity of Chicago’s Committee on Human 
Development. An entire age group in the 
public schools of a community of 40,000 
is being followed for a period of 10 years. 
This test was administered in March of 
1956 in the fifth year of the project. The 
subjects were in the 10th grade. 

The variables being compared with the 
CPI were arrived at in the following man- 
ner. Socioeconomic status was determined 
by using the Index of Status Character- 
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istics (ISC) developed by Warner (3). 
This method consists of obtaining for the 
family of each child a numerical score or 
index which expresses its socioeconomic 
position in the community. 

Leadership ability, withdrawn malad- 
justment, and aggressive maladjustment 
were determined by combining the results 
of two tests designed to screen out children 
displaying these characteristics. One of the 
tests, the “Who Are They?” is a socio- 
metric instrument based upon children’s 
evaluations of their peers with respect to 
the three behavioral characteristics pre- 
viously mentioned, plus a question, “Who 
would you like for your best friends?”, 
which was used to determine the friend- 
ship score. The other test, the “Behavior 
Description Chart,” is a forced-choice 
teacher rating instrument designed to 
screen out children displaying these same 
characteristics except friendship (1). 

The term “aggressive maladjustment” is 
used to typify the youngster who can’t 
control his impulses and who gets into 
trouble because he breaks rules, steals or 
destroys property, fights and quarrels, has 
a disposition to dominate accompanied 
with an indifference to the rights of others, 
or defies his parents, teachers, and others 
in authority positions. Some hostility is 
implied and also a nonconformity of an 
antisocial nature. 

“Withdrawn maladjustment” refers to 
children typified by the terms timid, shy, 
fearful, and hard to get to know. These 
children are insecure, fearful, and over- 
inhibited. They usually do not participate 
in group activities and usually keep their 
feelings to themselves. 

“Leadership ability” refers to a combi- 
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nation of three important elements: indi- 
vidual prominence, group goal facilitation, 
and group sociability. 

There were two measures of intellectual 
talent. The first, sixth-grade Intellectual 
Tests, was determined by combining the 
percentile ranks on several intellectual 
tests administered when these children 
were in the sixth grade. The following tests 
were used for each child: the Science Re- 
search Associates’ Primary Mental Abili- 
ties Test for ages 7-11; the verbal, spacial, 
and reasoning subtests of the Chicago 
Primary Mental Abilities Test for ages 
11-17; the Davis-Eells Games; the Good- 
enough Draw-A-Man Test; and the Thur- 
stone Concealed Figures Test. 

The second intellectual measure, the 
Differential Aptitude Test (DAT), was 
given the group in the 11th grade by the 
Illinois Statewide High School Testing 
Program of the University of Illinois. The 
test measures both abstract and verbal 
reasoning which are combined in the total 
score. 


PROCEDURES 


The staff of the Quincy Youth Develop- 
ment Project administered a somewhat 


shortened form of the CPI to the entire 
population of the 10th grade in the Quincy 
Public Schools using group administration 
in English classes at the school. Since chil- 
dren moving into the school system after 
the close of the Sth grade had not been 
tested on all of the instruments used in 
this study, they were excluded from this 
study. Included in the population were 
143 girls and 130 boys. 

Gough divides the CPI into four major 
classes or areas and these are further 
broken down into 18 scales. The staff was 
interested in only 12 of the scales, so it 
administered only 359 of the 480 items on 
the test. Ten of the 12 scales administered 
fell into two of the four major classes 
measured by the CPI, namely, Class I— 
Measures of Poise, Ascendancy, and Self- 
assurance and Class II—Measures of So- 
cialization, Maturity, and Responsibility. 
Five of the six subtests in each of the 
above classes were measured. High school 
norms were established by Gough for boys 
and for girls. These norms were used to 
determine standard scores for each scale. 
The correlations of these CPI scores with 
the other variables are the subject of this 
article. 


TABLE 1 








(Girls N = 143) 





Variables 
Intellectual 
sixth grade 


DAT 
lith grade 


Leadership | Friendship 





(Bors N = 130) 
ISC 
Intellectual 

sixth grade 
DAT 

llth grade .29°* .70** 
Leadership .25°* .37°* 
Friendship .21* .18 
Aggressive —.10 —.11 
Withdrawn — .08 — .28** 


are 
.2R** 











.30** 


.41°* .43** 


.49** .25°* 


Bae |. 34ee 


.76** 





.70** 
— .23°* | 
— .61** 


—.13 
— .45** 














* Reliable at the 2% level of confidence 
** Reliable at the 1% level of confidence 
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FINDINGS 


Table 1 shows the intercorrelations of 
the social status, talent, and maladjust- 
ment variables being compared with the 
CPI. 

Table 1 shows that for both sexes there 
is a clustering of the variables being com- 
pared with the CPI. High socioeconomic 
status, high leadership ability, having 
many friends, high intelligence and lack of 
withdrawnness seems to form a cluster. 
This clustering somewhat less pro- 
nounced for boys. For instance, leader- 
ship and friendship correlate 41 and 43 
respectively with ISC for girls, but only 
.25 and .21 for boys. While the negative 
relationship between withdrawnness and 
ISC and the intellectual measures is sta- 
tistically significant for girls, it is statis- 
tically significant only for sixth-grade in- 
tellectual scores for boys. 

Class I of the CPI, Measures of Poise, 
Ascendance, and Self-assurance, includes 


is 


the following scales: dominance, capacity 
for status, sociability, self-acceptance, 
sense of well-being, and one scale not 
administered, social presence. Class II, 
Measures of Socialization, Maturity, and 
Responsibility, includes: responsibility, so- 
cialization, self-control, tolerance, com- 
munality, and one scale not administered, 
good impression. The “total” category in 
Table 2 refers to the mean score given 
each individual on these 10 seales and 
three others, namely, intellectual efficiency, 
flexibility, and a scale not included by 
yough, college attendance. Table 2 gives 
these correlations. 


Discussion 


Table 2 shows that scores on the CPI 
are significantly related to a number of 
the variables being studied. Desirable or 
high scores on the CPI are associated with 
high socioeconomic status, intelligence, 
leadership ability, and being desired as a 
friend. There is a strong negative relation- 


TABLE 2 
CORRELATION OF THE CALIFORNIA PsycHOLoGicaL INVENTORY ScoREs 


WITH THE OTHER VARIABLES 


Status, Talent, and Maladjustment Variables 





| | 


| Intellec- | 
| _ tual 
sixth grade 


DAT h dershi lrri dship |A ive |Withd 
lith grade | -eaders. uP) rien "7 ggressive ithdrawn 
| | 


(Girls N = 143) 








.47** 


| A5** 
49** 


— 
.62°* 
.55** 


Class I CPI 
Class II CPI : 
Total CPI . | 64% | 
Total CPI with ISC Par- | .46°* | 
tialed Out 








(Boys N = 130) 





.35** 
.41** 
.44** 
.40°* 


Class I CPI 

Class II CPI 

Total CPI 

Total CPI with ISC Par- 
tialed Out 


x 27** 
.34%* | 
32** | 
.28"* 





* Reliable at the 2% level of confidence 
** Reliable at the 1% level of confidence 
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ship to withdrawnness, while the relation- 
ship to aggressiveness is less clear. In 
general, these associational tendencies are 
stronger for girls than for boys. 

Social status. Class II of the CPI meas- 
ures those aspects of personality which 
have to do with one’s relationships with 
others in terms of playing a socially re- 
sponsible role in society. There is a sig- 
nificant positive relationship between so- 
cioeconomic status and Class IT for both 
sexes. Middle class adolescents more often 
have socially acceptable attitudes towards 
others than do lower class adolescents. 

Class I measures those aspects of per- 
sonality which are related to one’s attitude 
toward himself. Here socioeconomic posi- 
tion seems to be more important for girls 
than for boys although there is a tendency 
in both sexes for middle class youngsters 
to be more self-assured. In Class I, the 
correlations between ISC and leadership 
and friendship were much higher for girls 
than for boys. Thus, from a socioeconomic 
standpoint, boys’ society seems to be more 
open. Lower class boys can rise to posi- 
tions of leadership and feelings of self- 
assurance and self-acceptance more easily 
than can their sisters. Perhaps athletic 
ability plays something of the same role 
in boys’ society that social status does for 
girls. 

Intellectual talent. The correlations be- 
tween intelligence and adjustment as 
measured by the CPI are statistically sig- 
nificant in every instance even when ISC 
is partialed out. These correlations are 
higher for girls than for boys. The corre- 
lations between the CPI and the DAT 
given in the 11th grade are somewhat 
higher than the correlations with the sixth- 
grade intellectual tests, but not signifi- 
cantly so. 

Friendship and leadership. Leadership 
ability and its closely related category 
friendship, are highly related to favorable 
scores on the CPI. There is some indica- 
tion that the scores of children chosen as 
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leaders and friends are more highly re- 
lated to measures of poise, ascendancy, 
and self-assurance than to measures of 
socialization and responsibility. 

Withdrawnness. Withdrawnness has a 
negative relationship to adjustment as 
measured by the CPI, and this is especially 
true for girls. This difference between the 
sex groups is due in part to the fact that 
withdrawn girls are more likely to come 
from lower class homes than are with- 
drawn boys. Withdrawn children usually 
have very low scores on measures of poise 
and self-assurance. While there is a nega- 
tive relationship between withdrawnness 
and such measures as responsibility, social- 
ization, tolerance, and communality, this 
relationship is much less pronounced and 
is not statistically significant for boys. 

Table 3 shows the standard scores of the 
most withdrawn boys and girls in compari- 
son to the rest of their classmates on each 
of the seales. If these groups had had the 
mean Gough obtained on his standardizing 
population, their means would have been 
50 with a standard deviation of 10. 

Table 3 indicates that, as a group, the 
most withdrawn children have scores on 
the CPI seales which are less favorable 
that those of nonwithdrawn children. They 
have particularly low scores on the scales 
in Class I, the differences being statistically 
significant for both sexes. Their mean 
scores for sociability and for self-accept- 
ance are more than a standard deviation 
below the mean for the rest of the group. 
The one subscale which runs counter to 
the general trend, sense of well-being for 
boys, cannot be explained, but this differ- 
ence is not statistically significant. 

Aggressiveness. Table 4 indicates that, 
in general, the highly aggressive adolescent 
girls have above average scores on the 
scales measuring poise, ascendancy, and 
self-assurance, Class I. This difference is 
significant at the 1% level of confidence. 
While the aggressive girls did relatively 
well on the scales measuring socialization, 
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TABLE 3 


MEAN STANDARD Scores oF THE Most WITHDRAWN Boys ANp GIRLS AS 
ComPaRED To Tuer C LASSMATES ON CERTAIN CPI ScaLes 


CPI Scales 











Variables 





| CI| Re “So | Se | To | cm lcm 
Most Withdrawn | | | 
Girls (V = 21) |44.1 /41.0 [37.0 [35.4 40.3 |39.6*145.6 45.3 46.7 144.7 [40.5 44.6* 


Do | Cs Os | Sy 
—a "mes 
Less Withdrawn " a 





Sa | Wo 
wll Peet 





Girls (NV = 145) 50.8 48.6 | 47. 4 49.5*|51.2 a 3 |52.2 51. 1 M7. 


J 
"| 
Most Withdrawn a | eg Ok 





Boys (N = 22) 43.0 46.2 42. 9 2 5 a 5 |44. — 3 M7. 6 a. 0 a7. 4 44. 
Less Withdrawn 

Boys (N = 129) se. |0o.8 lots 1.0 [7.8 [s.s+4.2 |o0.2 foo.s 49.2 47. 

Note.—Do— Dominance, Cs—Capacity for Status, Sy—Socisbility, Sa—Self-acceptance, wb— Sense of Well- 
being, CI—Class I, Re—Responsibility, So—Socialization, Se—Self-control, To—Tolerance, Cm—Communality, 
C If—Class II. 

* Differences reliable at the 1% level of confidence. 
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* Differences reliable at the 1% level of confidence. 


maturity, and responsibility, the aggressive tween social status and aggressiveness for 
boys had scores which were significantly boys. Neither relationship is statistically 
below those of the other boys. significant. 

Thus, for the group as a whole, aggres- Although, in general, boys get far more 
siveness tends to take a somewhat different aggressive mentions than do girls, girls get 
form in boys than in girls, and this makes more than twice as many nominations to 
the pattern of relationship to the CPI questions such as, “Who says unfriendly 
somewhat less obvious. While there is a things about people behind their backs?” 
slightly positive relationship between so- Girls more frequently nominate as aggres- 
cial status and aggressiveness for girls, sive those girls with high social status who 
there is a slightly negative relationship be- aspire to be leaders. Thus, there is even a 
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low positive relationship between aggres- 
siveness and poise, ascendancy and self- 
assurance for girls. 

Boys, on the other hand, more often ex- 
press their aggressiveness by misbehaving 
in class, or by physically pushing someone 
around. This is behavior typically associ- 
ated with lower class people. Thus, aggres- 
siveness as expressed by boys is less often 
found among the leaders and there is a sta- 
tistically significant negative relationship 
between aggressiveness and such measures 
as socialization, responsibility, and self- 
control. 

Interpretation of scores. Correlations 
between the CPI and the other variables 
should be taken into account in interpret- 
ing the results from the CPI. For example, 
since there are rather high positive corre- 
lations between adjustment as measured 
by the CPI and variables such as intelli- 
gence and high social status, when a middle 
class child with high intelligence has even 
an average CPI score his counselor might 
well regard this as a possible indication of 


psychological disturbance. 


SuMMARY 


In almost every instance there are sta- 
tistically significant relationships between 
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the California Psychological Inventory 
(CPI) and its subsections and such meas- 
ures as socioeconomic status, intelligence, 
and leadership ability. Further, the self- 
reports of adolescents on the CPI correlate 
with ratings of their psychological adjust- 
ment as seen by their teachers and peers. 

The CPI scores of children with high 
socioeconomic status, intellectual ability, 
and/or leadership talent are on the aver- 
age considerably higher than are those of 
children without these advantages. How- 
ever, since there is considerable evidence 
to show that maladjustment, below aver- 
age ability, and low socioeconomic status 
are correlated, these findings do seem to 
conform to the facts of life. 
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It is well known that the difference be- 
tween two test scores is likely to be con- 
siderably less reliable than either score 
alone. We are accustomed to reliabilities of 
90 or more (at least in heterogeneous 
groups) for many published tests in the 
cognitive area. How can we justify the use 
in individual guidance of difference scores 
that have a reliability of only .70? Or .60? 
These hardly seem usable according to the 
standards that we apply to ordinary apti- 
tude and achievement scores, yet higher 
reliabilities are frequently unobtainable. 

In 1923, Kelley presented a method for 
determining what proportion of the dif- 
ferences between two test scores is in ex- 
cess of chance (3; 4, p. 418). This method, 
or a revision of it (6), is used in some cur- 
rent test manuals. It does not seem to the 
present writer to be as useful as the simple 


reliability coefficient of the difference 
score, for which a convenient formula is 
derived by Gulliksen (2, p. 353): 


ff — Try 





ir (rex + Tyy)/2 — Pry 


1 — rey 


[1] 


Tad i= 
where faq is the reliability of the difference 
d = y — x, x and y being expressed as 
standard scores, or at least scaled so that 
their standard deviations are equal; rx, 
is the correlation between x and y; and 
7 = (rx, + Tyy)/2 is the arithmetic mean 
of the reliability coefficients for tests x 
and y. (It should be emphasized that this 
formula is valid only when x and y are 
scaled so that their standard deviations 
are equal. Published test batteries requir- 
ing the comparison of two test scores fre- 
quently meet this condition. If they do 
not, then more complicated formulas are 
required.) 

The purpose of the present paper is 
(a) to call attention to a natural way in 


which difference scores having relatively 
low reliability may be (and currently are) 
used effectively, (b) to suggest a method 
for inferring from the reliability coefficient 
of difference scores their effectiveness when 
used as outlined. 

The utility for individual guidance of 
any measure with low reliability is greatly 
increased under two conditions of use: 

1. The measure is used only to make 
broad rather than pinpoint judgments; 
eg., the score is used only to decide 
whether John is above or below the aver- 
age for his class, rather than to infer his 
exact percentile standing. 

2. Judgments are made only about those 

examinees with somewhat extreme scores; 
no judgment is made about those exam- 
inees with borderline scores. 
The foregoing conditions are approximated 
when difference scores are used according 
to some convenient rule such as the fol- 
lowing (1, pp. 11-12), recently advocated 
by John E. Dobbin: 

1. Each individual test score is plotted 
on a profile chart as a confidence band ex- 
tending one standard error of measure- 
ment above and below the actually ob- 
tained score. 

2. The difference between scores on two 
tests is treated as real only when the two 
confidence bands do not overlap. 
(Alternative convenient rules, appropriate 
for various situations, will not be elabo- 
rated here.) 

The effectiveness of any difference score 
when used according to the foregoing rule 
ean be evaluated by asking: “What pro- 
portion of the interpretations made from 
the difference scores will be correct?” This 
question is readily answered as soon as the 
reliability of the difference score has been 
computed from [1]. As an approximation, 
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it will be assumed here that tests x and y 
have equal standard errors of measure- 
ment (S.E.. = S.E.,). Moderate devia- 
tions from this assumption will cause only 
minor differences in the results obtained. 
A difference, d, is to be treated as a real 
difference only if 
|d| > 2S.E., [2] 
where |d| represents the absolute value of 
the difference. Since errors of measurement 
on tests x and y are presumably inde- 
pendent, 


SE, = VSE.?+ SE,’ = V2SE., [3] 
By the usual formula for any standard er- 


ror of measurement (2, p. 15), 


S.E.4 = 04 Vi = aa; 
or, expressing d in standard measure so 
that og = 1, 


BE. = V1 — rae [4] 


Note from [3] that SE, = S.Ey/+7V2. 
Substitute this value of 8.E., into the right 
side of [2] and replace S.Ey by the right 
side of [4] to show that the difference will 
be treated as a real difference if and only 
if 

|d| > V2 - Taa) [5] 


The “cutting point” given by the right 
side of [5] will be denoted by k.. 


k= VV 2(1 — raa) 


only when the tests are scaled so that 
oa = 1; if, on the other hand, the tests are 
scaled so that ¢, = oy = 1, then 


sumed here that the bivariate distribution 
is normal. 

The proportion of cases lying above 
d = +k in the bivariate distribution is 
the proportion of examinees for whom the 
judgment A > 0 is made. The proportion 
of these cases that actually lie above A = 0 
is the proportion of judgments correctly 
made. Similar statements hold for the 
judgment A < 0. 

Once the reliability of the difference 
score is known, these proportions are 
readily determined from tables of the bi- 
variate normal frequency distribution (5, 
vol. 2). Some illustrative results are given 
in Table 1. The last two columns represent 
both judgments of 4 > 0 and judgments of 
4< 0. 

The tabled values show that even dif- 
ference scores with very low reliability, by 
conventional standards, can provide many 
confident judgments. As here used, differ- 
ence scores with a reliability of only .42, 
for example, provide useful judgments 
about more than a quarter of the exam- 
inees, with 90% confidence in the correct- 
ness of the judgments made. Difference 
scores with a reliability of .64 provide use- 
ful judgments about 40% of the examinees 
with 95% confidence. 

It should be noted in this connection 
that each judgment states merely that a 
certain student has a positive or negative 
true difference; consequently, even ran- 


TABLE 1 











[Proportion o a , , 
examinees for Proportion o' 
whom judg- | judgments 
ments are correct 
made 


Reliability of 
difference 
(rad) 


Cutting 
point (k) 





og = V2A1—rsy) and k = ogV/2(1 — rea) 
= 2V (1 — raa)(l — ray). 


Consider now the bivariate frequency 
distribution between the observed differ- 
ence, d, and the true difference, A. It is 
weil known that the correlation between 
d and A is equal to ~/raq. It will be as- 
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dom judgments will be correct 50% of the 
time. This fact is shown by the last entry 
in the first row of the table. 

If difference scores with a reliability of 
.90 ever actually occur, some other rule 
will probably be preferable to the one used 
here, because the present rule in this un- 
usual case leads to so much caution that 
only 1% of the judgments made are in- 
correct. This situation illustrates the fact 
that the cautiousness of the rule used 
should be adjusted to each situation; i.e., 
if the reliability of the differences is .90, 
then perhaps a difference should be judged 
as real whenever |d| > 142S.E., , or when- 
ever |d| > some other multiple of S.E., . 
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The selection of tests to use for a high 
school guidance program presents the 
school counselor and administrator with 
many problems. Limited budgets may seri- 
ously curtail using enough tests to cover 
all the areas of academic ability, special 
aptitudes, achievement, interests, and per- 
sonality. Even where the budget might 
permit a wider selection of tests the time 
required may make their administration 
impossible. Lack of personnel trained in 
test usage and guidance may constitute a 
further hindrance to appropriate use of 
enough tests. Where measures of both 
academic ability and special aptitudes are 
desired, the Differential Aptitude Test 
Battery (DAT) may well be a wise choice. 
The present report seeks to extend the 
existing information bearing on the use- 
fulness of the DAT for predicting long- 
term academic success. 

Originally the DAT was included in the 
Minnesota State-Wide Testing Program to 
give the high school counselor measures of 
special aptitudes to use in conjunction with 
measures of academic ability. However, 
it has always been implicitiy understood 
that the DAT was perhaps most useful in 
predicting academic success. The DAT 
manual lists many coefficients of correla- 
tion which support this viewpoint. How- 
ever, few data are presented indicating 
long-term predictive validity of the bat- 
tery, and no coefficients based on com- 
binations of DAT subtests are reported. 

The DAT was introduced into the State- 
Wide Testing Program in 1951-52. At that 
time 108 schools administered approxi- 
mately 4,600 DAT’s. In 1954 after the 
same high schools had completed testing 


their juniors on different tests, 27 of the 
108 schools (approximately 20%), were 
selected at random. For the 27 schools, all 
students were selected who had a complete 
set of both DAT scores and junior meas- 
ures. These data were combined for all 
schools and statistics computed on the 
pooled data. From this selection procedure, 
629 boys and 532 girls were available for 
the study. 

The three academic measures obtained 
at the llth grade, the 1952 American 
Council on Education Psychological Ex- 
amination (ACE), the Cooperative English 
Test (English) scores and high school 
percentile rank (HSR), are those meas- 
ures most depended upon by the Univer- 
sity and the colleges of Minnesota for the 
selection and counseling of students. Re- 
lationships of high school freshmen test 
scores to these measures thus forges an 
important chain of information about a 
student for the entire high school career. 

Tables 1 and 2 show the means and 
standard deviations of the variables and 
the correlation coefficients of each individ- 
ual Differential Aptitude Test with each 
of the junior measures for boys and girls. 
The distributions for girls consistently 
have smaller SDs than do those for boys, 
except for the Clerical Test. Of the mean 
scores, boys are higher on Abstract Rea- 
soning, Mechanical Reasoning, and ACE. 
Girls are higher on Clerical, HSR, and 
English. 

Only two parts of the DAT, Verbal 
Reasoning and Numerical Ability, com- 
bine significantly in a coefficient of multi- 
ple correlation in predicting the academic 
measures. For boys these coefficients are 
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TABLE 1 
INTERCORRELATIONS OF NINTH GRADE DIFFERENTIAL APTITUDE Test BaTrery Scores, 
ELeventu Grape ACE Psycno.oaicaL EXAMINATION, CooPpERATIVE ENGLISH 
Test Scores, aNnD Hien Scuoot Rank 
(Boys N = 628) 





MR | CSA ACE | 
9.7 | 48.6 | 93.5 116.7 | 44.7 
1.4 | 10.4 | 25.9 | 38.2 | 29.0 











Coefficients of Correlation 





T 
. Verbal Reasoning | .65) .59) .52) .37 
. Numerical Ability 61) 35) 
. Abstract Reasoning pe 44 
. Space Relations 51) 
. Mechanical Reasoning 
6. Clerical Speed and 
Accuracy 
ACE 
Cooperative English Test 








TABLE 2 
INTERCORRELATIONS OF NINTH GRADE DIFFERENTIAL ApTiTUpDE Test Batrery Scores, 
EvLeventu Grape ACE Psycuo.ogicat EXaMINATION, COOPERATIVE 
Eneuisnu Test Scores, anD Higu Scnoot Rank 














19.5) 19.5 
8.1 8.0 





Coefficients of Correlation 


DAT | | | 

1. Verbal Reasoning 62). 50. ll 
2. Numerical Ability ‘ - ‘ 10 
3. Abstract Reasoning 59 13) 
4. Space Relations ‘ ll) 
5. Mechanical Reasoning - 06! 
6. Clerical Speed and 

Accuracy 
ACE 
Cooperative English Test 























69, 67 and .63 for ACE, Cooperative surprised that some parts of the DAT 
English and HSR, respectively. For girls have low relationships with such academic 
these coefficients are .71, .68 and .61, re- measures as the junior ACE and English 
spectively. test scores and high school rank. These 

The DAT user should not be unduly _ relationships do not invalidate the use of 
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the DAT with those students who do not 
have high academic promise. It may, in 
fact, make the DAT even more important 
in the guidance program where grades, test 
scores, and other background information 
indicate that a ninth grader does not have 
good college potential. The DAT is one 
set of tests which will indicate to what 
degree the student possesses special apti- 
tudes which he may capitalize upon in 
making a vocational choice other than one 
requiring college. 

The results of predicting the junior 
measures using the parts of the DAT, 
singly or in combination, can be compared 
to a parallel study by Layton (1) showing 
the relationship between senior ACE and 
Cooperative English Test scores, and HSR 
and ninth grade ACE and Cooperative 
English results for the two sexes combined. 

Multiple correlations for the combined 
DAT-VR and -NA tests are somewhat 
lower for predicting the junior test results 
than are the correlations between the 
freshmen- and senior-year tests as reported 


earlier by Layton (1). However, Layton 
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studied different editions of the same test, 
whereas in this study the tests are of a 
somewhat different type. The combined 
VR and NA prediction shows comparable 
correlations to the freshmen ACE when 
predicting HSR. In the earlier Layton 
study, ninth grade ACE correlated .63 
with HSR compared with the DAT-HSR 
multiple correlations of .63 (boys) and 
61 (girls) reported above. 

The following conclusions appear to have 
significant implications for counseling use 
of the DAT. The Verbal Reasoning Test 
is the best single predictor of the junior 
test scores and of high school rank, and 
the combination of Verbal Reasoning and 
Numerical Ability Tests gives a slightly 
higher correlation with the junior measures 
than does the Verbal Test alone. 
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This research brings together two de- 
velopments in the measurement of moti- 
vation. McClelland’s technique for meas- 
uring n Ach (4) was extended in an 
attempt independently to measure eco- 
nomic, academic, social, and professional 
needs for achievement as well as the 
general need that McClelland usually 
measures. This instrument was adminis- 
tered to a sample of Ss who also took a 
revised form of the author’s questionnaire 
designed to measure these four specific 
needs for Achievement (2). If the two 
types of measures of these motivations 
did not agree, criterion information (in the 
form of college grades for the academic 
criterion, college social activities for the 
social criterion, and ratings of professional 
promise by faculty members for the pro- 
fessional criterion) was provided to 
make it possible to evaluate the relative 
merits of the two measurement tech- 
niques. 

This research, then, attempts to answer 
the following questions: 


1. Can McClelland’s technique for meas- 
uring n Ach be readily modified so that it 
measures several different needs for achieve- 
ment? 

2. Will scores on these several needs for 
Achievement obtained from McClelland’s 
technique agree with scores from a question- 
naire aimed at the same several needs? 

3. What relationships are there between 
either type of measure of these selected 
needs and criteria which should reflect their 
presence? 


PROCEDURE 


Appropriate pictures were selected for 
eliciting expressions of various needs for 
achievement. To represent McClelland’s 
own efforts in this direction, pictures C, 


G, and H of the series described in The 
Achievement Motive (4), were used. From 
the work of Ricciuti and Sadacea (5, 6) 
three pictures were selected which seemed 
to be most successful in terms of their 
cross-validation against grades which had 
been adjusted for aptitude. These pictures 
were numbers 8, 24, and 28 (6). These two 
sets of three pictures each were incorpo- 
rated into the study to act as reference 
measures. In this way we meant to be 
certain that n Ach is measured by some 
of the best measuring materials available 
whether or not the new experimental 
pictures function properly. 

For new pictures to tap n Ach of an 
economic sort three pictures were selected. 
One shows a man and a woman making 
out a check in a bank or travel bureau. 
The second shows a man in working 
clothes getting a check from the pay- 
master. The third shows a man buying a 
camera on time payments. 

The three social pictures show a number 
of people dancing in one picture, a young 
man having tea with two young ladies, and, 
in the third, two distinguished gentlemen 
chatting at a formal party. 

The academic pictures show a young 
man glowering over books in a library, 
a boy with a book at a classroom desk 
(facing the front of the room), and, finally, 
a young man boarding a train with a suit- 
ease decorated with a “State” pennant. 

The three professional pictures were of 
a man leading a small, informally-uni- 
formed orchestra, a man working with 
chemical apparatus, and a number of well- 
dressed men about a conference table. 

The Ss were entering law students at 
an Eastern university during the fall of 
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1956. The n Ach test was given during a 
regular class period. The author scored all 
of the stories according to McClelland’s 
scoring system C. These Ss had partici- 
pated in another study earlier in the fall 
during which they had taken the revised 
aspirations questionnaire mentioned above. 
This questionnaire is composed of four 
intermingled sets of 16 items, making a 
total of 64 items. The four sets of items 
are designed to measure economic, social, 
professional, and academic aspirations in 
terms of responses to items such as: 


“At what level do you expect to be an 
officer in organizations such as the Ameri- 
can Legion, Rotary, country clubs, fraterni- 
ties, church groups, etc., by the twentieth 
year after finishing law school?” 

“What contribution do you expect to 
make to your vocation in the way of 
inventions, theories, research, etc., within 
twenty-five years after finishing law school?” 

“What will be the value of the automobile 
that you or you and your mate expect to 
own during the tenth year after finishing 
law school?” 


Each of the items has four alternative 
responses which lie along a dimension run- 
ning from less (lower status, less educa- 
tion, less money, lower grades, etc.) to 
more. The internal-consistency reliabilities 
of these four scores were computed on a 
different sample of 228 similar students. 
The reliability of the economic score 
was 83, that of the social score was 581, 
the academic score .69, and the profes- 
sional score .80. 

The criterion information obtained was 
first-semester and first-year law school 
grades, ratings of professional promise by 
the faculty members, and counts of the 
number of undergraduate and law school 
organized social groups to which each 
student belonged, according to the records 
kept by the law school. The reliabilities 


* The help of the law school in arranging 
for and assisting with this administration and 
providing the criterion information is greatly 
appreciated. 
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of the first semester average grade and 
the first year average grade were evalu- 
ated using the technique described by 
Ebel (1). The reliability of the former 
was .56, of the latter 59. The two sets of 
grades correlated with each other to the 
extent of 81, but the magnitude of this 
correlation is partly spurious due to over- 
lap. 

Faculty ratings were obtained for the 
purpose of this research. The character- 
istic rated was the “likelihood that each 
of the people whose names appear on the 
scale will ultimately become outstanding 
members of the legal profession (or some 
other profession).” Raters were cautioned 
to avoid confusing the amount of money 
a person might make or be worth with his 
professional promise. Six sets of ratings 
were obtained. Three raters provided data 
for 50 of the 56 Ss. The smallest number 
rated by any of the faculty was 24. 
The reliability of these average ratings 
was estimated, using Ebel’s technique(1), 
to be .80. 

The kinds of social activities counted 
for the social criteria were such things as 
membership in fraternities, debate so- 
cieties, student bar associations, B'nai 
B'rith Hillel Foundation, Young Demo- 
crats, etc. No reliability estimate is avail- 
able for these data but the correlation be- 
tween the member of undergraduate 
activities and the number of law school 
activities is only .24, not significantly dif- 
ferent from zero for 55 cases. 


REsvULTs 


With an N of 56, the average of the 
three intercorrelations among the three 
economic pictures is .18, for the social 
pictures it is .02, for the academic pictures 
it is .23, and for the professional area 
it is .10. In terms of test reliability these 
figures are very discouraging. It would 
seem that the author was inordinately 
clumsy in selecting pictures except for 
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the fact that McClelland’s three pictures 
only correlated with each other to the 
extent of an average of .12, and Ricciuti’s 
three best out of the great many pictures 
he studied correlated with each other on 
the average —.02. 

A total score based on 18 pictures was 
examined. Its reliability was estimated 
by finding the average intercorrelation 
among all the pictures and boosting this 
by means of the Spearman-Brown formula. 
The 18-picture coefficient of .64 is fairly 
good. 

The intercorrelations between total 
picture-story scores in each area, the 
reference picture-story total scores, the 
total questionnaire scores in each area, 
and the total picture-story score for all 
18 pictures, are contained in Table 1. It 
ean be seen in this table that the inter- 
correlations between the four questionnaire 
scores are appreciable. In fact, the cor- 
relation between the academic and pro- 
fessional scores when corrected for at- 
tenuation approaches 1.00. The presence 
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of such large correlations between areas 
does not occur with the picture test, but 
this seems to be at least partly a function 
of the low reliabilities of the picture scores 
when based on only three pictures. 

It is clear from the table that the cor- 
relations between pictures selected to in- 
duce fantasy in a particular area and 
questions directed at assessing expecta- 
tions of success in that area do not cor- 
relate to any great degree. Except for 
the social area, these correlations tend to 
be slightly negative. We might expect that, 
even if. the scores for particular kinds of 
motivation do not agree (perhaps largely 
due to unreliability of the picture scores), 
the over-all n Ach score from a large 
number of pictures should be related to 
how successful in life a person says he 
expects to be. The data reveal that even 
these correlations are negligible, and lack 
of reliability cannot be used to explain 
this phenomenon. 

These data give no support to the 
notion that motivation scores derived 


TABLE 1 
INTERCORRELATIONS AMONG MEASURES OF MOTIVATION 
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from the two techniques agree with each 
other. Next we ask which kind of measure 
of striving is most highly related to 
achievement. For the students who finished 
the first semester (N with complete data 
is 55) we have the correlations between 
four criteria (first semester grades, under- 
graduate social activities, law school social 
activities, and professional ratings) and 
the four questionnaire scores, the seven 
picture test scores, and the Law School 
Admission Test, a measure which we can 
consider for our purposes here to be 
general intelligence. 

First, the Law School Admission Test 
has no significant correlations with any 
of the criteria. This lack of significant 
relationship may be limited to the present 
sample, since in previous years studies 
have shown fairly high validities for this 
test at this institution. However, the low 
correlations between LSAT scores and the 
other variables in this study make it un- 
necessary to partial out the effect of in- 
telligence as measured here. 

Since LSAT scores do not account for 
much of the variance in the criteria, there 
is plently of room for other variables 
to be operating. We are surprised to find 
that none of our 11 measures of motivation 
is significantly correlated with first-semes- 
ter law school grades. The highest cor- 
relation with grades (not statistically sig- 
nificant) is a negative correlation of .22 
for the measure derived from the three 
pictures used by McClelland. It might 
be hoped that this finding is due to first- 
semester grades being inferior stuff. First- 
year grades might be of better quality. 
Unfortunately, first-year grades do not 
change the picture at all. They correlate 
81 with first-semester grades (partly spu- 
rious), none of the motivation measures 
is significantly correlated with them, and 
the highest correlation is still a negative r 
of .25 with McClelland’s three pictures. 

The other criteria are a little more 
encouraging. For those students who 
finished the first semester, the number of 
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undergraduate social activities on a stu- 
dent’s record has one significant correlation 
of 30 with a measure of motivation, and 
that happens to be the questionnaire score 
for social aspiration. That fits what one 
would anticipate if everything went prop- 
erly. The law school social activity cri- 
terion is unrelated to any of the predictors, 
but not too much should have been ex- 
pected of a count of the number of 
social activities and memberships for the 
first semester. The professional ratings had 
one significant correlation of .26 with the 
academic aspiration questionnaire score 
of these students. For students who com- 
pleted the whole year the professional 
ratings had two significant correlations; 
one of 28 with the economic questicn- 
naire score, and one, significant at the .01 
level, of 37 with the academic question- 
naire score. As a whole, the number of 
significant results from all of these cor- 
relations with the criteria is not very 
impressive. It will be necessary to ex- 
amine additional data to ascertain whether 
their appearance here is due to chance. 


INCIDENTAL FINDINGS 


It is interesting to note that the cor- 
relation between the number of words 
written and the achievement motivation 
score on MceClelland’s three pictures, 
alone, is .28 for the group who finished the 
first semester, significant at the .05 level. 
For this group of 55 Ss, the word count 
on the 18-picture test has significant cor- 
relations of .27 with the academic ques- 
tionnaire score, and .35 with the pro- 
fessional questionnaire score. 

For the sake of curiosity we decided 
to ask the Ss to write a brief paragraph 
telling what they expected to do or to 
achieve in law school. This was done 
in order to permit us to count the words 
they wrote and to compare this count with 
the count obtained from their picture 
stories. They were given three minutes 
during which to write. For the group of 
55 students who completed the first se- 
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mester, the correlation between these word 
counts was .30, significant at the .05 level. 
However, the additional word count did 
not correlate significantly with any other 
variable studied here. 

The data collected in the present study 
corroborate the suggestion made by Mc- 
Clelland and by Ricciuti and Sadacca 
that a score based on Achievement 
Imagery and Achievement Thema can 
replace the complicated system C score 
for many purposes. The correlation be- 
tween scores from these two systems is 
97. 


CONCLUSIONS 


These data do not permit us to give 
a clear answer to the first two of the ques- 
tions which prompted the study. These 
questions were based on a faulty assump- 
tion, that there was considerable internal 
consistency among scores derived from 
different pictures. Certainly that assump- 
tion is not supported by the data obtained 
in this study. The fact that the pictures 
do not agree with each other makes the 
reliability suspect, and with scores as un- 
reliable as these would appear to be it is 
impossible to conclude that the variables 
measured are unrelated merely because 
correlations are low. 

A useful argument can be made from 
these data, however. There is a con- 
siderable history of finding significant re- 
lationships between picture-story proto- 
cols scored for need for achievement and 
scores from other variables (4, ch. 8). 
Such significant relationships cannot be 
obtained consistently without there being 
some true variance in the picture-story 
protocol scores. On the basis of this kind 
of argument one might conclude that the 
internal consistency estimate of true vari- 
ance that we have used here is so con- 
servative as to be misleading. There really 
is more true variance in the n Ach scores, 
but it is not revealed in comparing one 
picture with another. This conjecture re- 


JOHN R. 


HILLS 


ceives some support from the correlations 
of the social pictures with the academic 


‘and the professional pictures. Those cor- 


relations if corrected for attenuation on 
the basis of reliabilities computed from 
the intercorrelations between pictures 
within an area would exceed 1.00 by a 
considerable amount. It follows that the 
kind of reliability estimates we have been 
considering must be too small to ade- 
quately reflect the amount of true variance 
present. The situation is as though each 
picture had a component of true variance 
which is specific in terms of many other 
pictures but is common with some other 
picture. One must have a large number of 
other pictures before the specifics can 
match up with each other to provide 
some internal consistency. If this is the 
case the following conclusions are in order: 
(a) MeClelland’s technique cannot readily 
be modified so that it measures several 
different needs for achievement. It can 
only be modified through use of a tech- 
nique like factor analysis of pictures. The 
obvious cues in the pictures do not une- 
quivocally reveal the kinds of motivation 
that they will tap. (6) If internal con- 
sistency reliability is desired, one would 
be wise to use a large number of pictures 
as stimuli rather than to try to get along 
with four or five as seems to be the 
standard practice. (c) The answer to the 
second of our initial questions, i.e., “Will 
scores on these several needs for Achieve- 
ment obtained from McClelland’s tech- 
nique agree with scores from a question- 
naire aimed at the same several needs?” 
must await the results of the factor 
analyses mentioned above. 

With regard to the third of our ques- 
tions, ie., “What relationships are there 
between either type of measure of these 
selected needs and criteria which should 
reflect their presence?” the answer for 
the picture-story scores is clear. Our data 
reveal no relationships which cannot 
easily be attributed to chance. The ques- 
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tionnaire measures do not fare quite so 
badly since each seale except the pro- 
fessional one had one significant relation- 
ship with a criterion. The results are not 
very impressive, however. 

One should perhaps note here McClel- 
land’s (3) discussion of the possibility 
that motivation and measured intelligence 
are inextricably confounded by the very 
nature of things. In this regard many 
implicit selection factors, including moti- 
vation, may have functioned in determin- 
ing who would enter law school. On these 
grounds any study attempting to relate 
motivation to achievement among ad- 
vanced professional students might be 
considered by some people to be fore- 
doomed to failure. Although this hy- 
pothesis might be invoked to account for 
the findings of this study, it is to be 
hoped that it will be possible to develop 
measures of motivation sufficiently sensi- 
tive to individual differences to be useful 
even in these situations. To the observer 
and the faculty member it certainly seems 
that some students are more highly moti- 
vated than others even at these advanced 
levels of education. However, the instru- 
ments used to measure motivation in this 
study may not be sufficiently sensitive 
to detect such small differences. 

Our incidental findings lead to the con- 
clusion that in any future studies using 
picture stories to measure n Ach, complex 
scoring systems may as well be abandoned 
in favor of a simple scoring for Achieve- 
ment Imagery and Achievement Thema. 
A simple count of the number of words 
written cannot be substituted for the other 
kinds of scores used here, and does not, 
in this study, relate to the criteria. 


SuMMARY 


An 18-picture Thematic Apperception 
type measure of need for Achievement and 
a questionnaire measure of aspirations 
were given to entering law school students. 
The picture-story test included three of 
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the pictures used by McClelland, three 
of Ricciuti and Sadacca’s best pictures, 
and four sets of three pictures each 
selected to reflect needs for economic, 
social, academic, and professional achieve- 
ment. The questionnaire was designed to 
reflect motivation in these same four 
areas. Criteria were available in three of 
these areas. 

The picture-story scores from one pic- 
ture to another did not agree at all well. 
Neither did any of the scores based on 
picture-story protocols correlate signifi- 
cantly with any of the criteria. This may 
be partly due to unreliability of scores 
derived from this type of test. The ques- 
tionnaire scores were satisfactorily reliable 
and displayed several significant correla- 
tions with the criteria. 
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Within the last several years there has 
been a considerable research interest in 
the attempt to determine what personality 
changes, if any, are associated with a col- 
lege education. Interestingly enough, the 
major study efforts have been confined to 
relatively small, private liberal arts col- 
leges for women. A limited perusal of col- 
lege bulletins, college charters and of the 
statements by educators will indicate that 
a concern for personality changes as a re- 
sult of a college education is a widespread 
concern and not limited to educators in 
private liberal arts colleges. Additional 
research needs to be undertaken in a 
variety of collegiate settings to determine 
whether or not results obtained in par- 
ticular collegiate settings hold for all col- 
legate settings; whether or not the ob- 
tained results for highly select female 
students are the same for less select female 
students, highly select male students or 
less select male students; whether or not 
results obtained in a privately supported 
college are the same as those obtained in 
a publicly supported college. The present 
study is one attempt to partially satisfy 
this additional research need. 

There have been at least three large- 
seale studies aimed at determining the 
effectiveness of a college education for 
bringing about personality, attitude or 
ideological system change. These studies 
were conducted at Bennington College by 
Newcomb (3, 4, 5), at Vassar College by 
Sanford et al. (6, 7, 8) and at Sarah 
Lawrence College by Murphy et al. (2). 

In the Newcomb study, it was con- 
cluded that the changes in college stu- 
dents’ attitudes were from freshman 
political conservatism to senior noncon- 
servatism. This general finding was based 


upon studies of concurrent groups of fresh- 
men, sophomores, juniors and seniors, and 
also upon the same persons as they pro- 
gressed through the four-year program. 
Additional information is reported by 
Newcomb (3) for concurrent college class 
groups at Skidmore College and at Wil- 
liams College. The latter groups appeared 
to change in the same direction as did the 
Bennington groups but to a lesser degree 
and from an originally more conservative 
position. 

The staff of the Mary Conover Mellon 
Foundation at Vassar College have re- 
ported extensively upon their findings 
regarding the effect of a Vassar education 
upon a student’s personality. Various ideo- 
logical system measures (particularly of 
Ethnocentrism—E—and Authoritarianism 
—F) were administered to groups of Vas- 
sar students and as Webster reports: 


It is encouraging to note that seniors score 
considerably lower than freshmen on E, F 
and F4 scales: This.is true for comparisons 
of concurrent freshmen and seniors, and 
for the test-retest comparison of 200 of the 
same persons, first as 1952 freshmen and 
later as 1956 seniors (8). 


The changes reported by Webster were 
in the direction of decreasing readiness to 
accept ethnocentric and authoritarian 
ideology. 

The other large-scale study of higher 
education and personality was conducted 
at Sarah Lawrence College. In this study, 
Murphy (2) concludes that there are 
desirable personality changes which seem 
to be a function of a college education. 

The Bennington, Vassar, and Sarah 
Lawrence studies were all conducted in 
educational institutions characterized as: 
(a) having female students only who were 
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highly select in terms of intellectual ability 
and socioeconomic status, (b) having rela- 
tively small student bodies found attend- 
ing colleges described as being intimate 
resident colleges where there was a great 
deal of individual attention, student inter- 
action and familiarity, and (c) deriving 
their support from private sources and 
relatively high tuitions. 

The present study differs from the Ben- 
nington, Vassar, and Sarah Lawrence 
studies in that it includes observations of 
the same male and female subjects over a 
four-year period. It is also different from 
the other studies in that the college setting 
would be characterized as a relatively 
large, publicly supported, coeducational 
and nonresidential college. The subjects of 
the present study were less select in terms 
of intellectual ability and socioeconomic 
status than were those of Bennington, 
Vassar, and Sarah Lawrence. 


Hypotuesis TESTED 


It was hypothesized that students in 
attendance for a four-year period in college 
change significantly in ethnocentrism 
during the four-year period, and that the 
change is in the direction of decreasing 
acceptance of ethnocentric ideology. 

PROCEDURE 

A modified form of the Total Ethno- 
centrism Scale: Public Opinion Question- 
naire E (1), developed for use in the 
study of The Authoritarian Personality, 
was administered to 1,030 entering stu- 
dents at San Jose State College in the 
spring of 1953. The E scale is a measure 
of acceptance of ethnocentric ideology 
with ethnocentrism 

. conceived as an ideological system per- 
taining to groups and group relations. A 
distinction is made between tngroups (those 
groups with which the individual identifies 
himself) and outgroups (with which he does 
not have a sense of belonging and which 


are regarded as antithetical to the ingroups). 
Outgroups are the subjects of negative 
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opinions and hostile attitudes; ingroups are 
the objects of positive opinions and un- 
critically supportive attitudes, and it is con- 
sidered that outgroups should be socially 
subordinate to ingroups (1, p. 104). 


The E seale has been described as a meas- 
ure of generalized prejudice. The modifica- 
tion of the E scale used in this study was 
a modification to the extent of elimination 
of two of the original 34 items. The two 
items that were eliminated, Numbers 13 
and 23 from The Total Ethnocentrism 
Scale: Public Opinion Questionnaire E 
(1, pp. 110-111) were judged to be 
obsolete. 

In April 1957, a determination of college 
status was made for each of the originally 
tested 1,030 subjects. It was found that 
of these 1,030 who entered as college 
freshmen in 1953, 315 would finish their 
four-year college degree program at the 
end of the spring semester 1957. Letters 
asking for participation in a public opinion 
survey, blank E scales, and a stamped 
return envelope were sent to these 315 
subjects in May, 1957. Responses were 
obtained from 271 (86% of the 315 sub- 
jects), and this respondent group consti- 
tuted the study sample. 


RESULTS 


Due to the desire to determine whether 
or not there were differences in ethno- 
centrism or change in enthnocentrism be- 
tween male and female students who had 
completed a four-year college education, 
the data for males and females were 
treated separately. 

A test of the significance of the differ- 
ence between sample means for correlated 
data was conducted for male 1953 E scores 
and male 1957 E scores. Table 1 contains 
the relevant results for this comparison. It 
is quite apparent that males changed sig- 
nificantly in ethnocentrism over the four- 
year period, and that the change was in 
the direction of decreasing acceptance of 
ethnocentric ideology. 
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TABLE 1 
A ComPARISON OF MALE 1953 anv 1957 E 
ScaLe Scores AND OF FEMALE 1953 AND 
1957 E ScaLe Scores 
(N for males = 137; for females = 134) 








Males Females 





E-1953 E-1957 E-1953 E-1957 





86.96 66.40 
26.04 25.15 


92.02 77.13 
22.96 25.28 
56 
7.64* 








54 
9.65* 





* Significant beyond .01 level. 


A test of the significance of the differ- 
ence between sample means for correlated 
data was conducted for female 1953 E 
scores and female 1957 E scores. Table 1 
also contains the relevant results for this 
comparison. It is quite apparent that fe- 
males changed significantly in ethnocen- 
trism over the four-year period, and the 
change was in the direction of decreasing 
acceptance of ethnocentric ideology. 

In order to determine whether or not 
there were sex differences in change in 
ethnocentrism, a test of the significance of 
the difference between male and female 
shift-score means was conducted. A shift- 
score was defined as a difference score with 
a constant of one hundred added to elimi- 
nate minus signs (shift score = 1953 E 
minus 1957 E plus 100). Table 2 contains 
the relevant data for this comparison. 

The t value for this comparison is sig- 
nificant beyond the .05 level and is very 


TABLE 2 


A CoMPARISON OF MALE AND FEMALE 
ETHNOCENTRISM Suirt Scores 


(N for males = 137; for females = 134) 








Males Females 





M 114.48 
o 22.15 
t 2.59* 





* Significarg; beyond .05 level. 
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close to significance at the .01 level. When 
this comparison is coupled with the com- 
parisons of ethnocentrism change over 
time for males and for females (Table 1) 
it ean be seen that both males and females 
change significantly in readiness to accept 
ethnocentric ideology, but that the change 
for females appears to be greater than that 
for males. What the factors are that are 
associated with this apparent sex difference 
in shift in ethnocentrism are at this time 
unknown, but a research project has been 
designed to investigate this and other 
problems.’ The demonstration of the need 
for future work on sex differences in per- 
sonality change associated with a college 
education had not come to light as a result 
of earlier studies due to the limitation of 
study samples to subjects of a single sex. 


SuMMARY 


Published studies relating to the deter- 
mination of what personality effects are 
associated with a college education were 
reviewed. The major studies that have 
been conducted were done in private, 
liberal arts colleges for women, and in 
each case the investigators reported that 
personality changes had taken place con- 
current with formal higher education. It 
was indicated that research needs to be 
undertaken in a variety of collegiate set- 
tings to determine if similar results would 
be obtained. The present study was de- 
signed to be one such study. 

Data from the Ethnocentrism Szale (1) 
were obtained for 271 college students who 
started their college education in 1953 and 
finished a four-year degree program in 
1957. 1953 and 1957 comparisons were 
made for males and for females. A com- 
parison of 1953-1957 shift scores for males 
and for females was also made. 

It was concluded that significant changes 


2A cooperative research contract study 
with the US Office of Education entitled 
“Personality changes associated with a col- 
lege education” was begun February 1, 1958. 
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took place in the readiness to accept eth- 
nocentric ideology in both males and fe- 
males, and that the direction of the change 
was significantly toward less acceptance of 
ethnocentric ideology. It was further con- 
cluded that, although males and females 
changed significantly in ethnocentrism, fe- 
males tended to change to a greater degree 
than did males. This significant sex differ- 
ence in personality change associated with 
a four-year college education points up the 
need for additional research before general- 
izations to all college settings can be made 
on the basis of earlier reported research. 
It would, of course, also be desirable to 
obtain data on noncollege controls to de- 
termine whether changes of the type ob- 
served in the present study occur as a 
result of the processes of maturing inde- 
pendently of college experiene. 
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PROBLEM SOLVING BEHAVIOR! 
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In predicting human learning behavior 
two factors usually considered to be most 
relevant are the difficulty of the task being 
learned and the amount of experience the 
individual has had in performing the same 
or a related task. Another factor seldom 
investigated, but which may be relevant, is 
what might be called a “situational” ex- 
pectancy for solving a specific problem. 
Such an expectancy might be learned in a 
situation where there is no possibility of 
immediate success or failure, a condition 
which is quite frequently characteristic of 
real-life situations. That is, in many in- 
stances in which the individual is faced 
with performing any type of task there is 
some uncertainty as to the eventual out- 
come; consequently he develops an expect- 
ancy ranging from complete certainty of 
eventual success to complete certainty of 
eventual failure as he performs the task. 

This study was designed to determine if 
a “situational” expectancy may be estab- 
lished for eventual success or failure in a 
problem solving situation which in turn 
affects the individual’s performance. The 
concept of expectancy utilized is derived 
from a social learning theory of personality 
developed by Rotter in which he defines 
expectancy as the “subjective probability 
held by the individual that a particular 
reinforcement will occur as a function of, 
or in relation to, a specific behavior in a 
given situation or situations” (17, p. 107). 
In this study all Ss were exposed individ- 
ually to the same problem-solving situ- 
ation and differential expectancies for 

This paper is based on a dissertation 
submitted to Ohio State University in partial 
fulfillment of the requirements for the Ph.D. 
degree. The author wishes to express her 
appreciation to J. B. Rotter for his help in 
this study. 


eventual success in solving this problem 
were established through the use of differ- 
ent verbal statements. 

There has been little research which 
specifically attempts to measure the effect 
of expectancy for eventual solution of a 
problem on the actual solution of that 
problem, although some other areas of in- 
vestigation may be considered relevant. 
One group of studies deals with the effect 
of anxiety on problem solving behavior. 
Although these studies are not conceptual- 
ized in terms of the effect of anxiety on the 
individual’s expectancy for solution of a 
problem, they are relevant in that they are 
concerned with problem solving behavior 
under conditions of stress or anxiety which 
are characterized by a highly probable 
punishing outcome. Similarly, in the pres- 
ent study one group functions under con- 
ditions designed to create an expectancy 
for eventual failure in a problem solving 
situation—conditions which would pre- 
sumably lead to an “anxiety state” if 
anxiety were to be conceptualized as a 
generalized expectancy for punishment or 
negative reinforcement to occur. The re- 
sults of “anxiety” studies do not demon- 
strate a clear-cut relationship between 
anxiety and performance, although some 
consistent relationships have occurred un- 
der certain specific conditions. For ex- 
ample, in a rote learning situation anxiety 
frequently motivates learning, while in a 
complex problem-solving situation anxiety 
almost consistently results in disorganized 
performance (1, 2, 8, 13, 15, 20). 

A more specific problem in the area of 
anxiety and problem solving behavior is 
the effect of anxiety on performance time, 
which for the present study was con- 
ceptualized in terms of the relationship 


166 





EXPECTANCY FOR SUCCESS IN PROBLEM SOLVING 


between an expectancy for eventual suc- 
cess or failure and decision time. Especially 
relevant here is a study by Lotsof (12) 
who tested the hypothesis that the more 
punishing the alternatives in a choice situ- 
ation, the longer the decision time for 
making a choice. His results indicated a 
direct relationship between the “unpleas- 
antness” of the reinforcement values of the 
alternative behaviors in a choice situation 
and decision time. In the general area of 
“anxiety” research the consensus seems to 
be that increased time is required for com- 
pleting a task under anxiety conditions 
(6, 14). 

Another relevant area of research in- 
cludes those studies which have investi- 
gated the effect of a systematic varying of 
verbal reinforcement on performance. 
Here again, as in the “anxiety” studies, 
there is considerable conflict in results. It 
has been demonstrated that praise facili- 
tates learning in some instances and in- 
hibits learning in others; likewise, reproof 
has been shown both to facilitate and to 
inhibit learning (5, 7, 9, 10, 16). In 
other instances it has been demonstrated 
that a combination of praise and reproof 
is the most effective incentive, and at other 
times that the effect of either is negligible 
(3, 4, 11, 18, 19). This confusion sug- 
gests that more ‘consideration must be 
given to the various experimental condi- 
tions involved before meaningful general- 
izations can be made. : 


Tue Present Stupy 


In the present study an attempt was 
made to test the utility of a concept of 
expectancy in predicting behavior in a 
learning situation. It was hypothesized 
that an expectancy for eventual success in 
solving a particular problem could be es- 
tablished which would in turn affect 
whether the problem would be solved. The 
operation defined for establishment of dif- 
ferential expectancies in this situation was 
the use of different verbal statements for 
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the Ss in three experimental groups. These 
included encouragement for one group, 
discouragement for a second, and a com- 
bination of the two for a third group. A 
control group was given no verbal rein- 
forcement. It was assumed that verbal 
comments have an effect on the individual 
which may result in the establishment of 
differential expectancies for eventual solu- 
tion of a problem. 

In studying expectancy, the usual pro- 
cedure might be to use practice as the in- 
dependent variable—that is, to control the 
S’s experience with right and wrong re- 
sponses—and to predict expectancy on 
the basis of these differential experiences. 
In this study this expectancy for immedi- 
ate right or wrong responses was controlled 
while varying the expectancy developed 
for eventual success or failure. That is, in 
some situations there may be an eventual 
goal of discovering the correct solution, as 
well as the more immediate goal of making 
a correct response. Here the immediate 
expectancy for right or wrong responses 
was controlled in that the average number 
of correct responses for each group was ap- 
proximately equal during the training 
trials. 

Specifically, the prediction was that sig- 
nificantly more Ss who received verbal 
encouragement (i.e., whose expectancy for 
solving the problem was high) would solve 
the problem logically than would Ss who 
received verbal discouragement  (i.e., 
whose expectancy for solving the problem 
was lower than the encouraged group) or 
than would Ss who heard no comments 
(control group). The criterion of a logical 
solution was met when S conceptualized 
the pattern rather than memorized the 
series of responses. In addition, it was pre- 
dicted that the performance of Ss who 
received both encouragement and dis- 
couragement would fall somewhere be- 
tween that of the encouraged group and 
the discouraged group. Finally, it was hy- 
pothesized that with the establishment of 
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a high or low expectancy for positive re- 
inforcement to occur following a response, 
the decision time for making each discrete 
response would vary; or more specifically, 
that there would be an inverse relationship 
between the time required for making a 
response and the expectancy for positive 
reinforcement to occur in conjunction with 
that behavior. 


METHOD 


Several criteria had to be met in select- 
ing a problem. First, the problem had to 
be a relatively novel one so that expect- 
ancies for success or failure would be in- 
fluenced as little as possible by past ex- 
periences. Secondly, the problem had to be 
of such difficulty that S would be unable 
to solve it in a designated number of 
trials. In this way E would be able to give 
the same number of verbal reinforcements 
to each S, and at the same time control S’s 
practice or experience so that all Ss would 
have approx'mately the same number of 
right responses during the verbally rein- 
forced trials (before the solution was made 
more available to them). Finally, the 
problem had to be performed individually 
because of the variable number of trials 
required by Ss to solve the problem, and 
because of the questions asked each S at 
the completion of the test. 

A problem was devised which met the 
above criteria. It involved an apparatus 
with two rows of 10 lights each near the 
top, one row controlled by a row of buttons 
on one side of the apparatus and the other 
by another row of buttons on the opposite 
side of the apparatus. The S sits at one 
side of the apparatus and controls the 
lights in the bottom row; £ sits directly 
opposite to him and controls the top row 
of lights. The task for S was to conceptual- 
ize the pattern by which EF was flashing a 
series of lights. Each trial consisted of S’s 
predicting which light E would flash by 
flashing the corresponding light in his row 
followed by E flashing the correct one in 
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his row of lights. The pattern or series was 
in three steps, so that the solution was ac- 
tually “available” once three trials had 
been completed. As was pointed out earlier, 
however, the problem was intentionally of 
such difficulty that it was almost impos- 
sible to solve until a hint was given after 
30 training trials. (The difficulty level was 
established in a prestudy group.) 

The learning series consisted of 30 train- , 
ing trials, each of which was followed by a 
verbal statement. The test series consisted 
of a maximum of 50 test trials, during 
which no verbal statements were made. At 
the beginning of the test trials the prob- 
lem was made easier by providing a hint 
as to the nature of the mathematical pat- 
tern being used. Throughout both the 
training and test trials S was constantly 
given information that would aid in the 
eventual solution by being shown the cor- 
rect response on each trial, whether he re- 
sponded correctly or incorrectly. 

The three experimental groups were 
treated as follows: 

1. Encouraged group: After each pre- 
diction made by S the correct light was 
flashed and S was encouraged, whether his 
response was right or wrong. The standard 
list of encouragements included such re- 
marks as “You’re doing fine, it’s very con- 
fusing at first,” ete. Whenever S made a 
correct response he was told “good.” 

2. Discoura’ «d group: After each pre- 
diction made by S the correct light was 
flashed and S was discouraged, whether his 
response was right or wrong. The standard 
list of discouragements included such re- 
marks as “wrong again, you’re not doing 
very well,” etc. For a correct response S 
was told, “It’s about time you got one 
right.” (If a correct response was made in 
the first five trails, or if there were two 
consecutive correct responses, where the 
above would be inappropriate, S was told, 
“That was just luck.”) 

3. Inconsistent group: After half of his 
predictions S was encouraged and after 
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half he was discouraged, whether his re- 
sponse was right or wrong. A standard list 
of comments was made up in random order 
from those made to the other two experi- 
mental groups. 

The Ss in the control group were given 
no verbal comments during the training 
trials. They were given the same hint as 
the experimental groups following the 
training trials and continued through the 
test trials with no comments made. 

Following the 30 training trials S was 
stopped and given a hint about the pat- 
tern. He was told that the pattern con- 
sisted of three steps and involved multi- 
plying by a certain number, subtracting a 
number, and adding a number, and that 
the same pattern was repeated over and 
over. The S was then allowed 50 trials 
(test period) in which to solve the prob- 
lem, during which the correct response was 
shown after each of his predictions but no 
comments were made. After S had pre- 
dicted correctly for 10 consecutive trials 
he was asked to describe the pattern. If 
he failed, he was allowed to complete the 
50 trials. Following his performance each 
S was asked several questions concerning 
his reaction to the test situation, the 
method he used in solving the problem, and 
his expectancy for being able to solve the 
problem. 

The primary criterion for adequate solu- 
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tion of the problem demanded that S be 
able to conceptualize the pattern—that is, 
be able to state the operations involved. 
However, the data were also analyzed on 
the basis of a criterion allowing for a 
memorized solution. 


REsULTs AND Discussion 


Significant differences between the 
groups in number solving the problem 
would indicate the establishment of an 
expectancy for eventual success, as oper- 
ationally defined in the present study, and 
the effect of such an expectancy on behav- 
ior in this specific problem solving situa- 
tion. As shown in Table 1, all of the hy- 
pothesized differences occurred in the 
direction predicted when the criterion for 
success was a logical solution to the prob- 
lem. The difference between the encour- 
aged group and the discouraged group was 
significant beyond the 5% point (P = 
038), and three other differences ap- 
proached significance. 

It should be noted that there was little 
difference between the number of Ss in the 
control group and in the encouraged group 
who solved the problem, while the differ- 
ence between the control group and the 
discouraged group approached significance 
(P = .062), as did the difference between 
the control group and the inconsistent 
group (P = .10). These results would sug- 


TABLE 1 
NumBer or Loaicat Sotutions By Group AND STaTIsTICAL COMPARISON OF DIFFERENCES 








No. Ss Solving _ Grup 2 


Group 3 Group 4 





Group Problem =. 


P 


x? P x? 





Encouraged 17 3.30 
(N = 30) 
Discouraged 
N = 30 
Inconsistent 
N = 30 

Control 
N = 30 


10 
11 
16 


2.400 07 -068 


-074 -39 2.440 


1.684 





Note.—Probabilities are stated in terms of probability point since directionality is predicted, thus allowing for the 


use of a one-tailed test of significance. 
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gest that individuals performed just as 
well under conditions where no deliberate 
attempt was made to establish an expect- 
ancy for eventual success as they did under 
conditions of verbal encouragement or high 
expectancy for success, and better than 
under conditions of verbal discouragement 
or low expectancy for success in solving 
the problem. 

The similarity in performance of Ss in 
the discouraged group and in the incon- 
sistent group suggests that the use of a 
combination of encouraging and discour- 
aging statements has about the same effect 
as the use of discouraging statements 
alone. In other words, in terms of the 
operation used here for defining expect- 
ancy, it was demonstrated in this situation 
that a low expectancy for eventual suc- 
cess in solving a problem cen be established 
by intermittent discouragement as readily 
as by 100% discouragement. 


TABLE 2 
AVERAGE Response Time PER TRIAL FOR 
30 TRAINING TRIALS AND FOR 
Supsequent Test TRIALS 








Test 


Training 
Trials 


Trials 





Group 


Time Time 





12.12” 
12.68” 
12.11” 
12.78” 


9.40” 
15.26” 
14.38” 

9.73” 


Encouraged 
Discouraged 
Inconsistent 
Control 

















TABLE 3 
SIGNIFICANCE OF THE DiFrFERENCE (t TEsT) 
BeTweEEN Groups IN AVERAGE DECISION 
Time FOR THE TRAINING TRIALS 








Group Group 2 | Group 3 | Group 4 





5187 
4.2193* 
3.3710* 


Encouraged 4.346* 
Discouraged 
Inconsistent 


Control 


-5206 














* Significant at .01 level. 
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As mentioned earlier, the criterion for 
success demanded a logical solution to the 
problem. It was found, however, that 
some Ss memorized the series of responses 
and were unable to conceptualize the steps 
involved in the pattern. Further analysis 
by groups showed that significantly more 
Ss in the low expectancy groups attempted 
to memorize the descrete responses than 
did Ss in the high expectancy groups. Al- 
though the number of memorized solutions 
was too small to be reliable, these differ- 
ences between groups in number of Ss re- 
sorting to the “lower level” conceptual 
approach of rote memorization would sug- 
gest that with a low expectancy for success 
S is more likely to resort to a familiar, 
readily available mode of response rather 
than to attempt to conceptualize a solu- 
tion as required in a novel problem solving 
situation. These differences in approach to 
problem-solving behavior warrant further 
investigation. 

Data relative to the hypothesis predict- 
ing an inverse relationship between the 
decision time required for making a re- 
sponse and the expectancy for positive 
reinforcement te occur are presented in 
Tables 2 and 3. It can be seen that differ- 
ences between the groups in average re- 
sponse time occurred only during the 
verbally reinforced trials (training period), 
and that the average decision time was 
significantly larger for the groups with a 
low expectancy for positive reinforcement 
to occur (Groups 2 and 3) than it was for 
the higher expectancy group (Group 1) as 
predicted. (An F test applied to evaluate 
the differences between groups during the 
test trials showed no significant differ- 
ences.) These results provide a further 
test of the findings of Lotsof referred to 
earlier. 

Homogeneity of variance was computed 
for all relevant comparisons using Bart- 
lett’s test (21). The results of none of 
these tests approached significance. 

Comments made during the experiment 
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and in an interview after each S’s perform- 
ance showed that Ss in the discouraged 
group expressed considerable hostility to- 
ward £ and toward the problem to be 
solved, as well as toward themselves for 
failing to solve the problem as quickly as 
they felt they should. This reaction was 
also characteristic of the inconsistent 
group, although it did not occur as fre- 
quently as in the discouraged group. There 
was little expression of hostility in either 
the encouraged or control groups. 


SuMMARY 


This study was designed to determine if 
a “situational” expectancy may be estab- 
lished for eventual success in a problem 
solving situation which in turn affects the 
indivdual’s performance. The findings sup- 
port the hypotheses that: (a) Differential 
performance results from establishing a 
“situational” expectancy for eventual suc- 
cess or failure which in turn influences an 
individual’s performance in a problem 
solving situation. (b) Encouragement and 
no comments (control) are both superior 
to discouragement and to intermittent en- 
couragement and discouragement. (c) Sig- 
nificantly more Ss in the low expectancy 
groups than in the high expectancy group 
attempt to memorize a solution to the 
problem in contrast to working out a logi- 
cal solution. (d) There is an inverse re- 
lationship between an expectancy for im- 
mediate positive reinforcement and the 
decision time required for making a re- 


sponse. 
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